STATISTICS 


HIGHER SECONDARY 

FIRST YEAR 


VOL . 11 


6 


4 


4 


2 


2 


0 


Х 


100 
% 


nor 


SOCIETA 


TAM 


TAMILNADU TEXTBOOK SOCIETY 


STATISTICS 


Vol. II 


Higher Secondary - First Year 


ADU 


SOCIETA 


HD 


TAMILNADU TEXTBOOK SOCIETY 

MADRAS 


Government of Tamilnadu 
First Edition — 1980 


Editorial Board Chairman . 
(Author & Review Committee Members ) 


Thiru . M. Sankaranarayanan , M.A. , B.Sc. , 

Joint Director of Statistics, 
Department of Statistics, 
MADRAS - 600 006 . 


REVIEW COMMITTEE MEMBERS: 


Thiru . T. K. Manickavachagam Pillai, M.A. , L.T., 

Professor of Mathematics (Retd .) 
A.C. College of Technology ; 
MADRAS - 600 035 . 


Thiru . R. Hanumantha Rao , M.A. , 

Professor of Mathematics, 
P.S.G. Arts College , 
COIMBATORE . 


Price : Rs . 8-00 


This book has been printed on concessional paper of 60 G.S.M. 
substance made available by the Government of India . 


Printed at 
SANKAR PRINTERS, MADRAS -600 018 . 


CONTENTS 


FIRST YEAR :: SECOND PAPER 


PAGE 


1. Measures of Central Tendency 


1 


2. Measures of Dispersion 


119 


3. Correlation 


183 


4. Regression 


212 


5. Rank Correlation 


237 


i 


6. Index Numbers 


242 


CHAPTER 1 


MEASURES OF CENTRAL 

TENDENCIES 


We have so far considered how a large number of statistical 
data can be condensed by means of tables and represented in 
charts and graphs for easy understanding and comparing. But 
tables, charts and graphs have their own limitations. Various 
distributions in the form of tables cannot be compared directly. 


Suppose we have two tables showing the wages distribution 
of workers in two different factories . It may not be possible to 
have a definite conclusion by means of direct comparison of 
the data . In order to make the comparison easy and effective 
so as to arrive at a conclusion , we should have a common measure 
which should describe the characteristics of the given data . It 
may happen in any distribution that a few values may occur more 
frequently and a few may occur less frequently. The values which 
may occur more frequently may lie in a particular part or position 
of the distribution . In most cases the particular part or position 
may be the central part or central position and hence that value may 
be taken as the central value for that distribution . The data in 
this distribution may have a tendency either to be equal to the 
central value or to tend towards that central value. Hence , that 
central value may be taken as a measure of central tendency. As 
this measure indicates location in the distribution this measure is 
also known as a Measure of Location . 


Let us select 100 uniform plots with measurements 10 m x 10m 
in different parts of a taluk and harvest the paddy crop and record 
the yield obtained from each of the plots . We are sure that the 
yields of all the 100 plots may not be equal to one another and 
the yield varies from plot to plot . There may be some extreme 
high yield due to better application of fertilisers and other inputs . 
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Similarly , in a few cases the yeild may be very poor due to want 
of proper irrigation facilities or due to pest attack . However, we 
may find that the yields in the remaining cases , other than the 
extreme values, may not differ much from one another or they 
may be close to one another. In other words, we can say that 
they cluster around some particular value or they are tending 
towards a particular value . The values will have a tendency to 
cluster around a particular value. Hence , the particular value around 
which the other values cluster may be called a central value. Such a 
central value is known as a Measure of Central tendency 
or measure of location . For different frequency distributions 
there may be different values of central tendency. Therefore, it 
can be said that frequency distributions may differ from one another 
in this aspect, namely in the measure of central tendency. 

There are different measures of central tendency. They 
are Mean or Average, Median and Mode . The mean may be 
classified into Arithmetic Mean , Geometric Mean and Har 
monic Mean . The Arithmetic Mean may be either of two 
categories namely Simple Mean and Weighted Mean . 


Characteristics of a good statistical average 


Though there are different types of averages , each average 
has its own advantages and disadvantages and hence different 
averages are used on different occasions . No single average is 
suitable for all purposes. We have to select the best for the occa 
sion . The average which satisfies certain characteristics can be 
considered as the best. 


1. It should be capable of being calculated by a well defined 

mathematical formula ., 
2. It should be based on the values of all the items in the 

distribution . 
3. The value of the average should not be unduly affected 

by extreme high or extreme low values. In other words , 
the value of the average should not be altered by a wide 

range because of the extreme values. 
4. The computation of the average should be simple and easy 

for understanding. 
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5. It should be amenable for any algebraic treatment. 
6. It should be stable . The value of the averages should 

not be affected much by small changes in the method of 
computation. 


Arithmetic Mean ( or) Mean ( or) Average 

Arithmetic Mean or Mean or Average are one and the same. 
Generally, the term Mean is always used in Statistics . The term 
‘Average is a familiar one and its computation is also much 
familiar and easy . Mean can be calculated for ( 1 ) ungrouped data ; 
( 2 ) discrete frequency distributions; and ( 3 ) continous frequency 
distributions. 


A distribution may consist of a number of units or indivi 
duals. For calculating the average , the values of the individuals 
are added and their total value is first computed. The total value 
thus arrived at, will be equally divided by the number of units 
or individuals and the average value is arrived at . This is called 
the simple average or arithmetic mean or simple mean . This 
method is called direct method . 


Let us consider the following example : 

The weight of five bundles are given in kg. Find the average 
weight of one bundle. 


Let x represent the weight of the bundle and the numbers 
1 , 2 , 3 , 4 and 5 represent the serial numbers of the bundles . 


Weight in symbol 


Actual weight 

kg . 


x 


45 


Xa 


:: 50 


X 


65 


X 


75 


... 


% 


40 


4 


Average 


= 


45 + 50 + 65 + 75 t 40 

· 5 


275 


II 


= 55 kg . 
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If x represents the value of the variable , the average value 
is generally written as F. The total number of units or indivi 
duals may be represented by n or N . 


If X , X , X 

X are the values of n different 
units, then the average # will be written as follows. In this problem 
n is equal to 5 . 


x , + x2 * x2 + x , + ...... Xo 


= 


N 


( whether it is N or n is immaterial at this stage ). 


The letter S stands for the term Summation which means 
addition . In Greek , the letter for the term summation is written 
as ( called sigma). The first term will be written at the bottom 


п 


and the last term will be written at the top of this letter £ . There 
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fore the average can be written as follows: 


n 


ผ 
ปี 


д 
Σ 
i = 1 


Xi or lin Ex 


i = 1 


N 


where ` i ranging from 1 to n represents the imaginery unit and there 
are n such imaginery units in the distribution . Since n is a 
constant number it has been taken outside the symbol. 


When the total value of all the items is divided by the number 
of items or units or individuals, the average will be obtained . 


18 


= 


Σ 
i = 1 


n 


5 


n 


Total value of all the items 


( 


Σ Χ 

xi 


i = 1 


Mean 

The total number of items ( n ) 
In other words, when the average is multiplied by the number 
of items or units or individuals, we will get the total value of all 
the items. 


пxx 


- 


Σ Χ 

Xi 


X ; or 


ng 
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Σ x ; or Σ Χ - nx 


= 0 


= 


i = 1 


i = 1 


Property of the Arithmetic Mean 


Some of the values of the units or members of a distribution 
may be greater than the average and some others may be less than 
the average . Therefore , the difference between the value and the 
average will be either a positive quantity ( + ) or a negative quantity 
( - )depending upon the value of each unit when compared with 
the mean . These differences are generally called deviations or 
variations. The sum of the deviations of all the items from 
the Mean will always be ‘ O . This is an important property of 
the Arithmetic Mean , This can be proved as follows : 


Example 


Value 

X ; 
( 1 ) 


Deviation 
( di) ( x - 3 ) 

( 2 ) 


45 


-10 


50 


45 – 55 
50 - 55 
65 55 


5 
+10 


65 


75 


55 = 


75 
40 - 


+20 
-- 15 


40 


55 


275 


0 


275 


Average 


= 55 . 


5 
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Let us take column 2 separately and examine it by splitting 
into two portions or sets as follows : 


First set 


Second set 


45 


55 


50 


55 


65 


55 


75 


I 
I 
I 
I 
I 


55 


40 


( ) 


: 55 


Total 


275 


6 ) 


275 0 . 


Add these two sets of values. 


The total of the first set of figures will be equal to the total 
of all the items. The total of the second set of figures will be 
equal to the product of the average multiplied by the number of 
units which will be equal to the total value of all the items. Hence 
the difference is O . 


We can now consider the formula , 


Value ( x ;) 

(1) 


Deviation ( di) 

( 2 ) 


* 


X , 


X 


}] 


de 


X2 


X , 


X2 


* g 


X 


X 


XX 


d. 
d . 
do 
da 


Xg 


xo 


RE 


7 


n 


d . + d t d , td , td .... + da 


= 


Σ d , 


i = 1 


( x, - 7 ) + ( x , – 7 ) + ( x3 – 7 ) + ( 2 . 


#) + ( x3 - * ) 

+ ( xp -- ) 


Remove the brackets and then re - arrange them by grouping the 
positive items and negative items . 


X2 


# + x2 – F + x , 


F + x 


Ã + Xo 


- 


&1 


+ x - * . 


= (x , + x , + x2 + x , t .. txn) - ( F + * + * + * + * + .... n times ) 


( x , - nx ) = 0 


n 


.. General formula 


= 


Σ Xi 


ng = 0 . 


i = 1 


This is a very important property of the Arithmetic Mean . 


Computation of Arithmetic Mean 


There are different methods for the computation of the Arith 
metic Mean . All these methods are developed to save time and 
also to avoid unnecessary labour. 
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The following is the monthly wages of 10 workers in a fac 
tory . Let us calculate the average wage of the workers. 


Method I 


Wages 


S.No. of the 

workers 


( Rs .) 


1 


115 


2 


112 


3 


117 


4 


118 


5 


111 


6 


115 


7 


112 


8 


119 


9 


111 


10 


110 


Total 


1140 


1140 


Average wage per worker 


Rs . 114 per head . 
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Let us examine the wages of the above workers once again . 
We notice that the wages of a few workers are the same. In other 
words, the same value is repeated or frequented more than once : 
We should now re -write , the above table : 
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Method II 


No. of workers 


Wage per worker 

(Rs . ) 
( 1 ) 


Total wages (Rs.) : 
( col.1 x col. 2 ) 

( 3 ) 


. 


( 2 ) 


115 


2 


230 


112 


2 


224 


3 


117 


1 


117 


118 


1 


118 


111 


2 


222 


119 


1 


119 


110 


1 


110 


Total 


10 


1140 


Total wages = Rs . 1140 . 


Average wage = 


1140 
10 


Rs . 114 per head . 


In the above two methods, the average wage per worker is the 
same. In the first method, which is a direct method , we have added 
straight away all the values and then calculated the average . In the 
second method, we have calculated the frequencies of each of the 
values as given in column 2. Afterwards we have multiplied the wages 
by the frequencies and given the total wages of the workers under 
the particular group as given in column ( 3 ). The total wages 
of all the workers as given in column 3 is divided by the total 
number of workers and the average wage is calculated . It shall 
be noted that calculation of frequencies for each value amounts 
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to classification and preparation of a frequency table . But the 
frequency table prepared on this basis is without the various class 
intervals. Instead of preparing each class with their class limits , 
we have the various classes with more or less their mid - values. The 
above method is also a short and direct method . 


Method III 


In the above example , we find that the wages of all the workers 
is more than Rs . 100. Hence we can subtract Rs . 100 /- from 
the wages of each worker and the balance can be written as follows : 


S.No. of the workers 

( 1 ) 


Actual wages - Rs. 100 

( 2 ) 


1 


15 


2 


12 


3 


17 


4 


18 


5 


11 


6 


15 


i 


71 


12 


8 


19 


9 


11 


10 


10 


Total 


140 


140 


Average 


Rs . 14 per head . 
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Since we have subtracted Rs . 100 from the wages of each 
of the workers and calculated the average , we should add 
Rs . 100 / - to this new assumed average wage of Rs . 147- and find 
out the average wage ( in original value ) of the workers . 


Average wage 14 + 100 = Rs. 114 per head . 

(in original value ) 


We can compare this with the situation where workers are 
getting their salaryin terms of one rupee currencies. Since the number 
of one rupee notes are very large, it takes time for counting thenand 
it is also difficult to carry in hand , the workers vould like to have 
one 100 rupee note and the balance in terms of one rupee note . 
But in the house they require only smaller denominations of the 
currency namely one rupee note for their day to day expense . 
Therefore , they may again exchange the 100 rupees note into one 
rupee notes. 


In order to have a clear distinction between the two averages , 
we can consider the original values as X and the new values obtained 
after subtraction of 100 from each as Y. Then the formula 
will be as follows : 


Let Y = X - 100 ; 


1 


.. 


Y = 1 


100 


Y + 100 


X. 


. 


The above simplification can be followed in the case of 
method II . 


Example : 


We can consider 118 as an arbitrary value. The value of 
d will be equal to x -- 118 . 


.. 


12 


d = x - 118 


No. of workers 


Total value 


2 


6 
--12 


2 


1 


115 – 118 = -3 
112 - 118 = -6 
117 — 118 = 1 
118 - 118 = 0 
111 1187 
119 – 118 

= +1 
110 – 118 = -_- 8 


1 
0 


! 


1 


- 14 


2 
1 


1 


1 


-8 


Total 


10 . 


--- 40 


-40 


Average of d = d 


= 


10 


ie : d = x - 118 , 


.. d 


3 – 118 


à + 118 


* 


11 


-4 + 118 = 114 


= Rs. 114 . 


Instead of taking a value occupying the central position 
( 118 ) we can take an other value 115. The position would emerge 
as follows : 


d 

= x- 115 


No. of workers 


Total value 


2 


0 


= 


2 


115 - 115 = 0 
112 - 115 : - 3 
117 - 115 2 
118 - 115 = 3 
111 - 115 = - 4 
119 - 115 = 4 
110 - 115 = -5. 


1 
1 


2 


3 
- 8 

4 


1 


1 


Total 


10 


-10 


13 


10 


d 


10 


F = d + 115. 

= -1 +115 = 114 . 


Though we have adopted different arbitrary values, the method 
adopted in all the cases are one and the same and the final result 
of the average is also the same. 


There are many shorter methods and we shall examine the 
advantages of these methods. Let us examine the salary of the 
following 10 workers and compute their average. 


Short - cut Method I 


S. No , of the 
workers 

( 1 ) 


Salary ( x ) 

Rs. 
( 2 ) 


Short cut Method ( A = 155) 
d = X 155 

( 3 ) 


1 


135 


135 - 155 = -20 


2 


. 145 


145 - 155 = -10 


3 


180 


180— 155 = 


25 


: 


4 


185 


185 , - 155 = 


30 


5 


195 


195 - 155 


40 


o 
o 
vagas 


6 


155 


155 - 155 = 


0 


7 


170 


170 - 155 = 


15 


8 


130 


9 
10 


140 
165 


130 - 155 25 
140 - 155 = 15 
165 - 155 = 10 


Total 


1600 


50 


14 


Σd 


Average = d. 


N 


50 


= 


11 


: 5 . 


10 


F = d + A 


= 5 + 155 = Rs. 160 . 


Since we have reduced each of the original values by sub 
tracting 155 from each, we have to add 155 to the average of the 
new values to get the average of the original values. The opera 
tion is just reverse from the original operation . 


The total salary of all the 10 persons as per column 2 , ie . by 
direct method is Rs . 1600. Hence the average salary as per direct 
method is also Rs . 160. In both the methods, the average obtained 
is the same . 


This method of computation is known as computation of 
Arithmetic Mean by shifting the base since we are shifting the 
base to 155 . 
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General case 


Values 

xi 


Deviation from A 

di 


* , -A 


. 


Xa 


* - A 


: 


X , 


* , 


---A 


Xo 


dp - A 


15 


Sum of the deviation 


Ed = ( x , — A ) + ( x2 - A ) + .... ( x -A ) 

, ) " A 
= Xi + 
x1 tig + ... + xq 

+ xn nA 
nĄ 
NA 


Σ και 


= nx 


Σ 4 


nä 


NA 


n 


11 


N 


d = - A 

ă + A 


X 


Short -cut Method II 


In this method we divide each value by a common number 
instead of subtracting an arbitrary value ( 155 ) from each value 
as adopted in the first method . Let us consider the same example 
and divide the salary of each person by a common number 5 . 


d = x / 5 


S.No. 
of the 
person 

( 1 ) 


Salary 
( X ) 
Rs. 
( 2 ) 


( 3 ) 


1 


135 


2 


145 


3 


180 


4 


185 
195 


5 


1355 = 27 
145 : 5 = 29 
180 : 5 = 36 
1857-5 = 37 
1955 = 39 
155 5 = 31 
170 = 5 = 34 
130-5 = 26 
140 - 5 = 28 
1655 = 33 


6 
7 


155 
170 


8 


130 


9 


140 
165 


10 


Total 1600 


Total 


= 320 


16 


Total of d = £ d = 320 . 


Σ d 


320 


Average of d = d 


32 . 


n 


10 


X 


We have assumed d 


xn 


5 


.. 


5d = x ; 


ie ; x = 5d. 


... 


X = 5 x d = 5 X 32 


Rs . 160 . 


This method is known as computation by change of scale. 
This method can be well compared with our day -to - day experience. 
We can consider the workers getting their salary in terms of one 
rupee notes . The number of one rupee notes can be exchanged 
for 5 rupees notes . The number of one rupee notes each worker 
would get is the same as the value given in Col ( 2 ). These one 
rupee notes can be exchanged for five rupees notes and the number 
of five rupees notes each worker would get is given in Column 
( 3 ). The average number of five rupees notes each one would 
get is 32 since the total number of five rupees notes is 320 . 
These five rupees notes can be exchanged for one rupee notes 
again . As each five rupees note can be converted into 5 one 
rupee notes , the 32 five rupees notes can be changed into ( 32 x 5 ) 
160 one rupee notes. 


Since we have initially reduced the size of the value by divi 
ding each value by 5 , we have to multiply the average of the new 
value by 5. The process adopted is just reversal of the original 
operation. 


Short - cut Method II Computation by shifting the base and chang 

ing the scale 


There is still another short cut method which involves the 
previous two methods simultaneously. In this method we subt 
ract first an arbitrary value from each one and the balance is divided 
by a common value. However this will become cumbersome if 
there is no common multiple. 


17 


S. No. 


1 


Salary 


y = x - 100 


Z 


= 


( x - 100 ) 

5 
4 


isle 


( 
1 
) 


( 2 ) 


( 3 ) 


( 
4 
) 


i 


1 


135 


35 


7 


2 


145 


45 


9 


3 


180 


80 


16 


4 


185 


85 


17 


5 


195 


95 


19 


r 


6 


155 


55 


11 


- 7 


170 


70 


14 


8 


130 


30 


6 


9 


140 


40 


8 


10 


165 


65 


13 


Total 


1600 


600 


120 


1600 


600 


120 


Average 


11 


10 


10 


10 


i.e. J = 160 


į 


60 


Ż = 12 


SI 


= 


ỹ + 100 


X 


= 


60 + 100 


X 


5z + 100 


= 160 


5 x 12 + 100 


60 + 100 


160 


S. II.2 
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As in the case of Method I , we have subtracted 100 from 
each of the x values and the values thus arrived at are given in 
column ( 3 ) as “ y values . Then each of the y values given in 
column (3) is divided by 5 to reduce it still further as in the case 
of Method II and the values thus obtained are given in the column 
( 4 ) as z values . 


In the process of conversion from x values to z values 
the following two operations are done. 

1. Subtraction of 100 from x to convert into y . 
2. Division of y by 5 to get z . 


100 


Y 
Z = 

5 


ll 


* - 100 

5 


; 


IN 


11 


inter 


11 


5 


5 


.: 53 = 


100 . 


5z + 100 


5x12 + 100 


60 + 100 


160 


Rs . 160 . 


Since division by 5 is the last step , we have to multiply the 
average of Z by. 5. Since the subtraction of 100 from x is the 
first step , we have to add 100 to the average of y obtained by 
multiplying the average of Z by 5. The operations are just oppo 
site to the ones carried out in the beginning but they are 
carried out in the reverse order . ] 


now 


Generally the third method , which is the combination of 
the first two methods will be adopted. Further, the arbitrary value 
will be denoted by the letter A and the diviser is denoted by the 
letter C . The formula would emerge as follows: 


X - A 
d 

C 

JA 
à 

C 
c.d. = - A 

# = c . d + A 
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Mean of a Discrete Frequency Distribution 


(i) Direct Method 

We shall consider the following frequency distribution ; 


Value 
( X ) kg . 


Frequency 

f 


Total Value 
( xf ) or ( fx ) 


* 15 


X3 12 


fi 3 
f . 4 
In 2 


(x , f ,) 45 
( x, ] , ) 48 
( x, 12 ) 34 
( x , f.) 18 


x 


17 


* 18 


11 


1 


Total 


10 


145 


Since there are four values of x , they are denoted by the 
symbols x ,, * , * , and x , and their repective frequencies (1 ) are 
denoted by the symbols fi, fq 1 , and fi 


The total of each value ( xf ) is obtained by multiplying the 
value by its frequency . Thus there are four totals represented 
by x, fi, xgf , x fs, and x , for Their grand total is equal to 
* , fi + xgfs + xg fo + x , fi The total number of items is equal 
to the total of the frequencies, that is , ſ , + f , + f , + fi 

The grand total of the values 
The average w 

Total number of items. 


1 


* 


x , f , tx , fg +xgfg + x.fi 

fi + f2 + f tfi 


The formula can be expressed by the following symbol. 

Ex , f Σ fix 

( or) 
Σf . Σf 


X 


145 


11 


45 + 48 + 34 + 18 

3 + 4 + 2 + 1 


15 


14,5 kg . 


10 


20 


Short - cut Method – I (Shifting the base ) 

We can adopt short cut method also . We can assume 10 
as the arbitrary value . Then the values will be changed as follows 
without any change in frequencies: 


d 


f 


dr. 


3 


15 


4 


8 


15 — 10 = 5 
12 - 10 2 
17 - 10 = 7 
18 — 10 = 8 


2 


14 


ol-n t 


1 


8 


45 


10 


45 


45 


Average à 


4.5 


10 


d = x - 10. 

F – 10 


d . = 


.. 


X 


d + 10 = 4.5 + 10 


14.5 . kg . 


Shortcut Method II : (Changing the scale ) 

In this method, each of the discrete values can be reduced in 
size by dividing it by a constant number. 


X 


d = 


f 
( 2 ) 


xf 
( 3 ) 


kg . ( 1 ) 


X/ 15 
( 4 ) 


d.f. 
( 5 ) 


60 


2 


120 


4 


8 


75 


4 


300 


5 


20 


90 


6 


540 


6 


36 


105 


3 


315 


7 : 


21 


135 


5 


675 


9 


45 


- 


1 


Total 


20 


1950 


130 


21 


Exf 


1950 ; F 


11 


Σ xr 
Σf 


1950 

= 97.5 kg . 
20 . 


In the above case , each value can be divided by. 15 and the 
value obtained is given in col (4 ). The product of d and the fre 
quency is given in col. ( 5 ). 


fd ( or) - Edf 

df = 130 . 


130 

= 6.5 
20 
· = 

# = 15X6.5 =97.5 


d = x / 15 ; :: 15.d = x ; ... 15 d = F ; 


General Case 


xi. 


xi 


f 


di 


fi di . 


Cс 


X , 


ti 


* 


d , 


1 


1 


Xy = f, d , 


fi 


с 


с 


X 


f 


da 


fa * = f,d, 
def . 


Xo 


fa 


- fada 


с 


C 


I f; d = f , ( x) + f , 

( x2 ) ( x ) 

t..fo 
1 /c ( f , x , + f2x2 ..... + fox ) 
1 / c fix ; 


Dividing by N 
Σf , d , 

· 1 / C 
N 


X 


d = 


C 


cà 


22 


We find that the averages obtained both by direct method 
and short - cut method are one and the same . 


Short-cut Method III : ( Shifting the base and changing the scale ) 

Let us combine the shortcut methods I and II for calculation 
of the averages . We shall consider the same example. We shall 
take 90 as the arbitrary value and subtract it from each value 
of x and the result obtained can be divided by a common number , 
15 . 


x 


f 
( 2 ) 


y = x 90 

( 3 ) 


d = y ; 15 

( 4 ) . 


( 
1 
) 


fd 
(5 ) 


60 


2 : 


60 - 90 = -30 


- 2 


-4 


75 


4 


75 - 90 = -15 


- 1 


mar 


4 


. 


1 


90 


6 


90 - 90 = 


0 


0 


0 


: 


105 


3 


105 -- 90 = 15 


1 


3 


135 


5 


: ; 
135 90 = 45 


3 . 


15 


20 


10 


! 


* — 90 


The value of d obtained from the value of x by the substi 
tution d 

is given in col ( 4 ). The product of each item 

15 
in column ( 4 ) and the respective frequency is given in column (5 ) . 


fd = 10 ; Ef = 20 
d 10/20 = 0.5 
x — 90 

d ; 
15 

15 


x 


90 


d 


15 d . 


90 


23 


À 


= 15 d + 90 


- 15 x 0.5 + 90 


= 7.5 . + 90 


= 97.5 


General Case : 


x - A 


1 


X : 


f 


d 


fd 


C 


fa 


Xi 


d1 


* , - A 


fidi 


C 


x 


f 


d2 


x; -4 


f2d2 


C 


XN 


· 


A 


xn 


fn 


dn 


fndn 


C 


fidi 


= f1 


(x1 - A ) ( x2 - A ) 

+ f2 


( xn —- A ) 


t .... + fn 


с 


с 


C 


1 / c ( flxl -- f1A + f2x2 — f2A + ......fnxn -- fnA ) 
1/0 [ ( f 1x1 + f2x2 .. + fnxn ) - ( FIX + f2A + ..fnA ) }] 
1 / c { ( xifi - A fi) } 
1 / C ( Exifi 

NA ) 
1 / C ( N # NA ) 
N 

A ) 


II 


( i 


C 


Dividing by N 

Σ fidi 
N 


N 


( * 


A ) 
N 


1 )} 


С 


A 


đ 


= 


. 


C. 


NI 


A 


.cd 
cd + A 


11 
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It is seen that in the case of frequency distribution of discrete 
values also , we can adopt shortcut methods with greater advan 
tages. In all these shortcut methods, we reduce the size of the 
original values by a substitution and thus save time and labour 
in the computation . 


Mean of a continuous frequency distribution 


Direct Method 


Let us consider the following continuous distribution, which 
is constructed from the raw data . 


Class 


Frequency 

(2 ) 


kg. 
60.5 – 70.5 


1 


70.5 – 80.5 


5 


80.5 


90.5 


9 


F 


90.5 


100.5 


14 


100.5 


110.5 


15 


110.5 


120.5 


4 


120.5 – 130.5 


2 


-- 


Total 


50 


In the case of continuous frequency distribution giving the 
different classes and their resepective frequencies, we have to 
first calculate the class marks or the mid - values of each class . 
This is a very important step and this should be attempted first 
in the case of frequency distribution where only class limits are 
given . The mid values of the classes can be calculated by finding 
the average of lower and upper limits of each class and the frequ 
ency distribution will be converted as follows: Once the mid 
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values are calculated they may be used for all other statistical 
purposes . The main assumption indirectly made is that all the 
values within a class are more or less having the same value as 
the mid - value of the class which may not be correct . However, 
we are not having any disadvantage because of this assumption and 
rather we have more advantages. 


Classes 


Mid Values 

( x ;) 
( 2 ) 


Frequencies 

fi 
( 3 ) 


Total weight kg . 

xi 

( 4 ) 
( col. 2 X col. 3 ) 


( 
1 
) 


60,5 – 76.5 


65.5 


1 


65.5 


75.5 


5 


377.5 


85.5 


9 


769.5 


70.5 – 80.5 . 
80,5 – 90.5 
90.5 – 100.5 
100.5 – 110.5 
110.5 – 120.5 


95.5 


14 


1337.0 


105.5 


15 


1582.5 


115.5 


4 


462.0 


120.5 


130.5 


125.5 


2 


251.0 


Total 50 


4845.0 


4845 


Average 


96.9 kg. per head . 


50 


The above example is the frequency distribution constructed 
with the help of the data given in an example The Mean 
calculated directly from the raw data without any grouping 
is 97. The difference between the Means calculated with the 
help of these two methods is only 0.1 ( 97.0 --- 96.9 ) which is very 
insignificant. 


In this context, it may be noted that the average calculated 
by the first method is the correct one, though it is very laborious. 
When we consider the ease with which the average is calculated in 
the second method and also the insignificant difference , the second 
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method will be a more advantageous. This difference is due to the 
assumption as explained earlier that all the students within a 
particular class are having the weight equal to the mid -value of the 
particular class . But in actual situation this assumption is not 
correct. So this difference is due to the classification . As the width 
of the class interval is reduced , the difference in the average will also 
be reduced further. The lesser the class interval the lesser the 
difference and consequently greater the accuracy. While the average 
obtained from the first method is the true value of the average , 
the average obtained from the second method may be called as the 
estimate of the average . Hence the difference between the two 
values. 


We can further simplify the computation process. Here 
after it is enough if we consider the mid - values of the class intervals 
only since the mid - value represents the class itself. 


Short - cut Method I ( Shifting the base ) 

As we have adopted shortcut methods for discrete distribu 
tions , we can follow similar shortcut methods. 


Mid values 

( x ) 


Frequency 

( f ) 


65.5 


1 


75.5 


5 


85.5 


9 


95.5 


14 . 


15 


105.5 
115.5 
125.5 


4 


2 


Total 


50. 
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we 


If the mid - values and the frequencies are in bigger numbers, 
multiplication of these two will become a problem . But 
cannot alter the frequencies. However, we can reduce the size 
of the mid values by subtracting a convenient common arbitrary 
value. Here the central value namely 95.5 can be taken as the 
arbitrary value . The table can be rearranged as follows: 


Frequency : 


Mid value 

( x ) 


New value 

( d ) 


Value 
( df ) 


d X 95.5 


( 1 ) 


( 2 ) 


( 3) 


( 
4 
) 


65.5 


. 


65.5.- 95.5 = - 30 


1 . 


30 


307 


75.5 


75.5 - 95.5 


20 


5 


- 100 


-220 


: 


85.5 


85.5 - 95.5 = - 10 


9 


90 ) 


95.5 


95.5 – 95.5 = 


14 


0 


105.5 


105.5 -- 95.5 = 


10 


15 


1507 


115.5 


115.5 - 95.5 = 


20 


4 


80 +290 


125,5 


125,5 – 95.5 


30 


2 


60 


. 


Total 


50 


70 


A 
A set of values of the new variable d are obtained . We 
can also find out as before the average of the new variable d . 
The total of all the d values is equal to 70. Therefore, the average 
is equal to 70 • 50 1.4 . Each of the d values is less than 
the corresponding n value by 95.5 . kgs . Therefore , the average 
of d will also be less than the average of the x by 95.5 kgs . 
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d 


il 


x - 95.5 . 
= J 95.5 


d 


d + 95.5 = x 

1.4 + 95.5 = 96.9 . 


1 


This is the same as the one calculated previously and this is also 
an estimate of the average. In the case of x , the values are deter 
mined from O or with O as the base , since all the values are 
started or counted from O . But in the case of d , values are 
counted from 95.5 or with the arbitrary value as the base . Here 
we are actually shifting the base from 0 to 95.5 ( A ) and converting 
them into new variables. Therefore, the operation involved in 
this process is shifting of base . 


Computation of Arithmetic Mean by shifting the base and also chan 

ging the scale 


In the above process involving the shifting cf the base , the 
values of the new variable d are still bigger such as 30 , 20 , 10 
etc. They can be further reduced in their values by dividing each 
of the d values by the class interval of the table ( C ) 10 in this 
problem . ie. d values will be expressed in terms of class intervals. 
The working of the problem is given below : 


y = x - 95.5 d = y / 10 
( 2 ) 

( 3 ) 


fi 
( 4 ) 


dfi 
( 5 ) 


( 1 ) 


65.5 


- 30 


-3 


1 


3 


75.5 


--20 


5 


-10 


--22 


85.5 


-10 


1 


9 


9 


95.5 - A 


0 


0 


14 


0 


105.5 


+10 


+1 


15 


15 


115.5 


+20 


+2 


4 


84.- 29 


125.5 


+30 


+3 


2 


6 


Total 


50 


+7 


29 


The total value of all the fd s == 7 . 


... Arithmetic Mean of the d = d . 


7/50 = 0 · 14 . 


Let d 


y / 10 = ylc à 


Islo 


с 


. 


ỹ = c . à 


Since y = x 


95.5 


y 


F - 


95.5 


y + 95.5 = 


X 


... 


F = y + 95.5 
= 10 X 0.14 + 95.5 since ( w = Cd + A ) 


= 1.4 + 95.5 


96,9 


In this method the operations involved in the conversion 
of the x variable into d variable are , ( 1 ) subtraction of the 
arbitrary value and ( 2 ) the division by the class interval. There 
fore , for calculating the Mean of x from the mean of the new 
variable , we have to handle the operations exactly opposite to 
those adopted earlier and that too in the reverse direction . 


First we have to multiply the mean of the new variable d 
by the class interval C and then add the arbitrary value A . 


Here also the Mean value obtained is only an estimate of 
the true Mean . This method ccrresponds to the conversion of 
Fahrenheit temperature into Centigrade and vice versa . The 
lowest readings in the Fahrenheit and the Centigrade thermo 
metres are 32 ° and 0 ° respectively. In this respect, it can be said 
that the bases are different. The higher temparature readings 
in these thermometres are 212º and 100 ° respectively. In this, 
the scales are different since 180 ° divisions in the Fahrenheit 
are divided into only 100 ° in the centigrade. 


For conversion of Fahrenheit into centrigrade , we first 
substract 32 from the reading and then divide the balance by 
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9/5 = 180/100. Hence for conversion from centigrade to Fahren 
heit we first multiply the centigrade reading by 9/5 and then add 
32 ° . It may be noted that in all the shortcut methods, the value 
of the Mean obtained is same . 


: 


Conversion of 113 ° F into Centigrade: 


. 
; 


5 


113 — 32 = 81 • 915 = 81 x 


45 °. 


9 


Mean of the combinations 


Weighted average ( or) Weighted Mean 

Let us consider 3 groups of students namely A , B and C. The 
number of students in each group is not uniform and their average 
weight is also not uniform . Let us combine all the three groups 
and find out the average weight of the combination or the combined 
group . The data are given in the following table. 


Group 


in the group 


No. of students 

Average 
weight of 

students ( kg .) 
( 2 ) 

( 3 ) 


Total weight of all 
the students in the 

group (kg .) 


( 1 ) 


( 
4 
) 


А 


10 


48 


48 x 10 


480 


: 


B 


15 


52 


52 x 15 


780 


Cс 


25 


40 


40 x 25 


1000 


Total 


50 


2260 


Average 


2260 
50 


45.2 kg. per head. 


Though it appears somewhat cumbersome at first, it is very 
simple when we think over this. We have calculated the total 
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number of students in each group ( 10 + 15 + 25 50 ). Next, 
we have calculated the total weight of each group of students by 
multiplying the average weight of each group by the number of 
students in the corresponding group as given in the last column 
of the table. By adding the total weight of the different groups 
of students, we get the total weight of all the students in the combi 
nation . The final step is the total weight of all the students in 
the combination is divided by the total number of all the students 
in the combination . We get the average weight of students in 
the combination . 


1 


Total number of students in the combination : 10 + 15 + 25 
= 50 . 


Total weight of students in the combination 


( 48 x 10 + 52 x 15 + 40 x 25 ) 


480 + 780 + 1000 


= 2260 kg . 


2260 


Therefore , the average 


45.2 kg. 


50 


The simple average can be calculated by adding the number 
of averages and dividing the total thus arrived by the number of 
groups. 


Number of groups 


3 : 


The total of the averages 


48 + 52 +40 = 140 . 


Average 


140/3 


= 46.7 kg 

. 


The difference between the simple average and the weighted 
average may be examined . These two averages will be one and 
the same, when the number of students in all the groups is uniformly 
equal. In this problem , the number of students in each group 
forms the weight of the respective group . 
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We can now consider the symbolic representation for this. 


No. of students Average weight 
in each group of students in 


Group 


the group 


Total weight of 
all the students 
in the group 

( 4 ) 


( 1 ) 


( 2 ) 


( 3 ) 


A or 1 


nп , 


Foy 


, 1 


Bor 2 


na 


Thug 


n2Win 


Cor 3 


ng 


sel 


Пg Х. 

3 


Let JF 

represent the average weight of the combination 
or the overall average weight. 


1 


.. 


na, tngt , t ngłg 

n , tn , tn , 


11 


Let n , + ng + ng = N. 

N. i . nex , + nxg + nom , 

N 
The above table can be re -arranged as follows: 


Average 
weight 


No. of 
students 


Total weight of 
all the students 


Group 


in the group 


A 


F , ( 48 ) 


ne ( 10 ) 


n, 


= 10 X 48 


, 


1 " 1 


B 


, ( 52 ) 


n , ( 15 ) 


nx, 


15 x 52 


с 


, ( 40 ) 


no (25 ) 


п , Ж. 


25 x 40 


In this table , the average weight of each group represented 
by the letter x can be taken as the variable , and the number of 
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students of each group as the frequency of the respective, group . 
Hence the table can again be re - written as follows. 


Value 


xi fi 


Frequency 

fi 


di 


F. (48 ) 


fi ( 10 ) 


48 X 10.480 . 


F , ( 52 ) 


f , ( 15 ) 


52 x 15 = 780 . 


* . ( 40 ) 


f . ( 25 ) 


40 x 25 


= 1000 . 


Total 


50 


2260 


2260 


1 


11 


Exifi 

N 


> 


Σ Χ 

xifi 
Σf : 


11 


= 45.2 kg . 


50 


Change in the formula 


In the formula for the average , the numbr will be written 
first and value afterwards, 

E ni Xiwhenever the number of members 

Σni 
in each group is referred to as numbers . On the other hand , the 
number of members in each group isr also eferred to as frequencies. 
Then in the formula for the average, value will be written first 
and the frequency afterwards. 

xi fi 
Σf: 


i 


But practically there is no difference between these two 
formulae : 


Si 


II 


- 


Σ ni ti 
Σ ni 


Σ f ; 


It should be noted that whenever the average or Mean is 
mentioned , it is only the Arithmetic Mean and not any other 

S. II 3 


2 
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Mean . . It may be either the simple mean or Weighted Mean 
depending upon the circumstances. 


MEDIAN 


Median occupies the second important place next to Mean 
in statistical analysis. Median means middle ; therefore, median is 
difined as that value of the variable which divides the distribution 
into two equal halves, so that an equal number of units or indivi 
duals, or items are on either side of that Median value. Therefore, 
it should be clearly noted that Median is calculated with reference to 
the position or location of the items rather than with reference to 
the value of the items . The arrangement of the units or items in the 
order of their magnitude is very necessary for determining the value 
of the Median . In other words , the units have to be arranged 
in a frequency distribution . But such arrangement is not neces 
sary for computing the Mean , since the values of all the items 
are totalled irrespective of their position in the series. 


Properties of Median 

1. It divides the distribution into two equal parts. There 
fore, the number of units or items in each part will be equal . 

2. The value of all the items in one part will be greater than 
the value of the Median . Similarly , the value of all the items 
in the other part will be less than the value of the Median . 

Hence it can be termed as a locational or positional or central 
average. Median can be computed for ungrouped data as well 
as for grouped data as in the case of Mean . 


Computation of Median for ungrouped data 

Let us suppose the statistical details given below relate to 
weight of seven persons expressed in kg. 


45 , 39 , 48 , 42, 50 , 35 , 37 . 


In order to locate the Median , we have to re -arrange the 
value first either in the ascending or descending order of the magni 
tude. Let us arrange them in the ascending order as follows: 

35, 37, 39 , 42 , 45 , 48 , 50 . 


i 


35 


Since there are seven items, the fourth item is the central 
value. Therefore, the value of the fourth item ie : 42 kg . is the 
median value in this series . 


Let us consider another example where the number of items 
is an even number , 


45 , 35 , 30 , 41 , 47 , 38 , 48 , 52 . 


1 


Let us rearrange the values as follows: 
30 , 35 , 38 , 41 , 45 , 47 , 48 and 52 . 


As there are eight items in this series, no single item can be 
termed as the central item . Therefore, two items namely 4th 
and 5th items constitute the central items . Therefore , any one 
of the values ( 42 or 43 or 44 ) between 41 and 45 can satisfy the 
condition of the Mean . But it is always advisable to take the 
value in the middle of these two values 41 and 45 . 


86 


Median 


41 + 45 

2 


43 . 


2 


Median will also be expressed in the same unit of measure 
ment as the original units. 


General Formula 


n + 1 
If there are n items in the series , the 

item will be 

2 
the Median value. If there are seven items in the series, 

7 + 1 

2 
8/2 . 4th item will be the Median . On the other hand, if 
there are eight items in the series, 

8 +1 

972 = 4.5 th item will 

2 
be the Median. In other words , it will be equal to the average 
value of the 4th and 5th items in the series . 


Median for Discrete frequency distribution 

Let us examine the following frequency distribution of marks 
obtained by students in a class. 
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Marks 


No. of students 


47 


5 


51 


2 


55 


4 


59 


2 


61 


1 


Total 


14 


The above 5 values ( 47, 51 , 55 , 59 , 61 ) can be written as follows 
because of the frequency. The total number of items in the 
series will be equal to 14. The value 47 has to be repeated five 
times because of its frequency equal to 5. Similarly , the other 
values have to be repeated as many times as their respective 
frequency. The series will be as follows: 


S.No. 


( x ) 


S.No. of the last item 

of each value 


1 


47 


2 


47 


3 


47 


4 


47 


5 


47 


→ 5 


6 . 


: 


51 


7 


51 


+7 


8 


55 


9 


55 


10 


55 


: 


11 


55 


→ 11 


12 


59 


13 


59 


--> 13 

14 


14 


61 


37 


.. Median 


n + 1 

th item = 
2 


I th item 


14 + 1 

2 


7.5 


:: The value of the 74 th item is the Median value. Values 
after 7th and upto 11th items are only 55 . 


... 


Median is 55 . 


In this process we have followed a laborious method of repea 
ting the same value again and again and given the serial number. 
Instead of this , we can calculate the Median from the cumulative 
( less than frequency. 


Let us calculate the cumulative frequency of the values as 
follows : 


Marks 

( x ) 


Frequency 

( f ) 


Cumulative frequency 

( c. f .). 


47 


5 - 


5 


51 


2 - 


7 


55 


4 


个 


11 


59 


2 – 


→ 


13 


61 


1 


14 


It may be seen from this that items after 7 and upto 11 are 
55. The figure noted by an arrow on the right hand side of the 
value indicates nothing but the cumulative frequency which is the 
same as the serial number of the last item for each of the values . 
The same situation may arise in majority of the cases and hence 
the changes have been effective . 


Median for continuous frequency distribution 

The procedure adopted for the calculation of Median in 
the case of continuous frequency distribution is different from 


$ 38 


the previous one . While we consider the cumulative frequency 
only in the previous case , we consider here not only the cumu 
lative frequency but also the class limits. The cumulative 
frequency, the class limits, the frequency of the Median class, and 
the class interval of Median class are taken into consideration 
simultaneously 


Procedures to be followed 


1 . We need the true class intervals and the class limits. 
First we must see whether the class intervals given are true . If 
not we must convert them into true class intervals by suitably 
fitting the lower and upper limits of the classes . 


2. The total frequecy should be calculated and let it be 
denoted by the letter N. 


3. We should find half of the total frequency by dividing 
the total by "2. Let it be N / 2. 

4. - We should also calculate the less than cumulative 
frequency for each class. 


5. From the cumulative frequencies, we should locate the 
Median class where the Median value or N / 2 nd item falls. 


6. We should note the lower limit of the Median class 
as . I or L. 
7. We should note the frequency of the Median class as f . 

“ . 


8. We should note the class interval of the Median class 
and it should be denoted by c . 


9. We should calculate less than cumulative frequency of 
the class previous to the Median class or preceeding class of the 
Median class . Let it be m . 


10. By the method of interpolation the following formula 
may be adopted for calculation of the Median . 


( N /1– m ) x 2 


Median = ] it 
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Computation of the Median from grouped data 

Median can be easily calculated from the frequency distri 
bution. We shall consider the weight of the same 50 bags for 
this purpose. We can count half of the frequencies from either 
end of the distribution to ascertain the value of the median . For 
this purpose, we should consider the frequencies and the less than 
cumulative frequencies of the table . 


Class 


Frequency 

( 2 ) 


Less than cumulative 

frequency ( 3) 


( 1) 


60.5 --- 70.5 


1 


1 


70.5 - 80.5 


5 . 


6 


80.5 - 90.5 


9 


15 


90.5 -100.5 


14 


29 


100.5 -110,5 


15 


44 


1 


110.5 -120.5 


4 


48 


120.5 –130.5 


2 


50 


Total 


50 


The total No. of items in this problem is 50 = N. 
N / 2 50/2 

= 25 . 


From the less than cumulative frequency given in col. ( 3 ), 
we know that there are 15 bags upto the class 80.5 – 90.5 . It 
is understood therefore, that 15 bags are having weight less than 
90.5 kg . In order to have 25 bags ( N / 2 ) we require 10 more 
bags ( 25-15 = 10 ) and these 10 bags have to be taken from the 
next class namely 90.5 — 100.5 and hence this class is known as 
Median class . The frequency of the Median class is 14 bags 
and the class interval of the Median class is 10kg . These 14 
bags are arranged within a range of 10 kg. Therefore , 1 bag in 
this group or class is arranged within a distance of 10/14 kg . 
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In other words, the interval between two successive bags will be 
10/14 kg . We require, 10 more bags, and these 10 bags will be 

10 x 10 
arranged within a distance of 

100/14 7.1 

14 
kg . As this class starts with a lower limit of 90.5 kg. the 25th 
bag will occupy the position of 90.5 + 7.1 = 97.6 kg . We 
shall apply the formula and arrive at the Median value. 


= 25 . 


• m 


..M 


Median = M ; N = 50 ; N / 2 = 50/2 

= ( cumulative frequency upto the previous 

class of the Median class ). 15 
f = 14 ( frequency of the Median class ). 
C = 10 (class interval of the Median class ). 

( N / 2 – m ) 

c 
M = l + 

f 

( 50/2 15 ) x 10 
90.5 + 

14 

(255 – 15 ) x 10 
= 90 : 5 + 

14 

10. X 10 
= 90 5 + 

14 
90.5 + 7 1 

97.6 kg . 


• Type II 

Let us calculate the Median from the following : 


No. of workers 


Cumulative frquencies 


Fortnightly wages 

( Rs .) 
( 1 ) 


( 2 ) 


( 3) 


30 


21 
31 - 40 
41 - 50 
51 —- 60 
61 - 70 
71 - 80 


2 
5 
12 
9 . 
4 
2 


2 
7 
19 
28 . 
32 
34 
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There is a difference between the previous type and the pre 
sent type. In the previous type, the class intervals are true class 
intervals , whereas in this case , the class intervals are not true. 
Therefore, the class intervals have to be converted into true class 
intervals as follows : 


No. of workers 


Fortnightly 

wages 
Rs . ( 1) 


Cumulative 
frequency 

( 3) 


( 2) 


2 


2 


5 


7 . 


20.5 -- 30.5 
30.5 – 40.5 
40.5 - 50.5 
50.5 - 60.5 


12 


· 19 


9 


28 


60.5 70.5 


4 


32 


4 


70.5 -- 80.5 


2 


34 


34 


N 


34 ; N2 


34/2 


17 . 


Median class 


40.5 


50.5 


C 


30.5 


40.5 = 10 


1 


= 40.5 


f 


12 . 


7 


( N / 2 - m ) 1 c 
M = 1 + 

f 

( 17 – 7 ) 
40.5 + 

x 10 

12 
= 40.5 + 10/12 x 10 

100 
40.5 + 

40.5 +8.3 = Rs. 48.8 . 
12 
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Type III 


Mid value 

( 1 ) 


Frequency 

( 2 ) 


7 
6 
9 
4 


15 
25 
35 
45 
55 
65 . 
75 
85 


5 : 


4 
7 
4 


46 


In this example the mid - values of the classes are given instead 
of the class limits . The limits of various classes have to be deter 
mined from the mid - values of the classes. In order to fix the class 
limits, we require class intervals. The interval between two succe 
ssive mid - values will be the class interval. In this particular case , 
the difference between successive midvalues is 10 and hence 10 can 
be taken as the class interval. Hence half the value of class 
interval can be subtracted from the midvalue to find out the lower 
limit and half the value of the class interval can be added to the 
mid - value to find out the upper limit . Thus the true class inter 
vals can be calculated as follows : 


Class interval 

( 1 ) 


Frequency 

( 2 ) 


Cumulative frequency 

( 3 ) 


10 - 20 
20 - 30 
30 – 40 
40 - 50 
50 - 60 
60 — 70 
70 - 80 
80-90 


7 
6 
9 
4 . 
5 
4 
7 
4 


7 
13 . 

22 
, 26 

31 
35 
42 
46 


46 


43 


: 


N = 46. N / 2 

23 ; m = 22. Median class = 40 - : 50 . 
1 = 40 ; C 

10 ; f : 4 . 
( N72 - m ) x c 

f 
( 23 – 22 ) 10 
- 40. + 

4 


M 


It 


= 40 + 2.5 = 42.5 


Type IV 


Calculation of Median when cumulative frequencies without fre 

quencies are given 


Let us consider the following example . 


Marks less than 


Cumulative frequency 


10 


3 


20 


5 


30 


40 . 


12 


50 


20 


60 


25 


This problem is slightly different from the previous one . 
In this problem , upper class limits of the classes are given . There 
fore , we have to find out the lower class limits of each class and 
establish each class . Since we are given the cumulative frequency, 
we should calculate the actual frequency by subtracting the 
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cumulative frequency of the preceeding class from the cumulative 
frequency of the class and construct the table : 


Class 


Cumulative frequency 


Frequency 


0 - 10 


3 


3-0 = 3 


10 - 20 


5 


5- 3 = 2 


20 - 30 


8 


8 - 5 = 3 


30 - 40 


12 


12 - 8 


4 


40 – 50 


20 


20 - 12 = 8 


1 


50 60 


25 


25 - 20 = 5 


25 


The procedures for calculation of Median afterwards are 
same as before . 


N = 25 ; 1/2 25/2 12.5 
Median class = 40 - 50 . 

1 = 40 ; 


m 


12 ; f = 8 ; C = 10 . 


M = 1 + 


( N / 2 - mc 

f 


40 + 


( 25/2 - 12 ) x 10 

8 


10 


= 40 + * * 


8 


= 40.6 


: 


Computation of Median from graphs 

We can also calculate the Median from the graph . As we 
require less than cumulative frequency, we can compute the Median 
from the graph for the less than cumulative frequency. In the 
graph , the values of the frequencies will be plotted on the Y - axis. 
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As the Median value relates to the middle portion of the distri 
bution ( N / 2 ), we shall locate the position on the Y -axis equivalent 
to N / 2 . From this point on the Y - axis, a line parallel to the X 
axis can be drawn to cut the ogive curve . From this point of 


251 


125 


И 
K 


M 


60 


FIG . 12 
Median 


intersection on the ogive curve , a line parallel to the Y - axis and 
perpendicular to X - axis can be drawn , cutting X - axis. The point 
of intersection on the X -axis will indicate Median value. 


201 


; 


N₂ 
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M 


60 


FIG . 13 
Median 
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We can use the curve for either lower than cumulative fre 
quency or greater than cumulative frequency. In both the cases , 
the procedures are same. However, we can use both the curves 
simultaneously. From the point of intersection of these two 
curves for less than and greater than cumulative frequencies, we 
should draw a perpendicular to X -axis. Irrespective of the fact 
whether we use less than or greater than cumulative frequency 
curve , the value of N / 2 , is same and consequently the line drawn 
from the point on Y axis corresponding to N / 2 , will cut the 
ogive curve on the same point. The point of intersection of the 
perpendicular on the X - axis will be the Median value . 


Property of the median 
1. An important property of the Median is that it is inflenced 

by the position of the items in the array and not by the 
value or size of the items. 


2. The median will be expressed in the same unit of measu 

rement as the original items in the distribution . 


QUARTILE ( ) 


We have seen that the Median divides the number of units 
in the distribution into two equal halves. There are other meas 
ures similar to Median which divide the distribution into many 
parts. We can divide the distribution into four equal parts or 
five equal parts or 10 equal parts or 100 equal parts . For each 
kind of division we have different kinds of measures and they 
are also called measures of central tendency. 


Quartiles are another set of measures of central tendency 
which divide the distribution into 4 equal parts . In dividing the 
distribution into four equal parts , we should have three points 
cf location and the values of these dividing points are called Quar 
tiles. Since there are three dividing points, they are designated 
with the order of their arrangement or occurrence namely First 
Quartile ( Q ), Second Quartile ( 22 ) and Third Quartile ( Q. ) . 


A 


B 


Q1 


le 
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First Quartile or Lower Quartile 

The first quartile divides the distribution into two unequal 
parts such that 1/ 4th of the items or 25 % of the units or individuals 
will have values less than the value of the first quartile value and 
the remaining 3 /4th or 75 % of the units or individuals or classes 
will have values greater than the value of the first quartile. 


1. Calculation of the lower quartile from ungrouped data 

1. The given items or values should be arranged in the ascen 
ding order of their magnitude . 


2. The total number of items may be noted by the letter 


n . 


The lower quartile ( Q ) should represent the value of 
( n + 1 ) 

th item . 
4 


If there are seven values as follows: 


40 , 25 , 35 , 50 , 60 , 20 , 75 , they have to be re -arranged as follows 
in the ascending order of the magnitude. 


20 , 25 , 35 , 40 , 50 , 60 , 75 . 


n = 


7 ; n + 1 = 7 + 1 = 8 . 


The second item represents the Qy . Therefore , the value 
of the second item ie . 25 represents the Qy . 


In case the (n + 1) is not exactly divisible by 4 , the following 
procedures may be adopted. 


Suppose there are 10 items in an example : 
20 , 25 , 35 , 40 , 60 , 70 , 72 , 79 , 80 , 85 . 
n = 

10 ; n + 1 = 11 ; ( n + 1 ) /4 11/4 = 2.75 (23) 


The lower quartile lies between the second and the third 
items. 
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Value of the second item 


= 25 . 


Value of the third item 


35 


Difference between the 
second and third values 


35 a 25 


10 . 


ith difference between second 


10 x 3 


and the third items is 


= 7.5 


4 


i li = Value of second item + 3/4 th difference between 

second and third items. 


25 + 7.5 


= 32.5 


If there are eight items in the series, the value of the ( 8 + 1 ) / 4 
th ie : 2.25th item will be the lower quartile . In other words, 
it will be equal to the sum of the values of the second item and the 
4th value of the difference between the values of the second and 
third items. 


If there are 9 items in the series, the value of ( 9 + 1 ) / 4 2.5th 
item will be the lower quartile. This is equal to the sum of the 
value of the second item and half the difference between the value 
of the second and third items . 


II . Calculation of lower quartile from discrete frequency distri 

bution 


The following procedures may be adopted : 


1. The given frequency distribution should be converted 
into less than cumulative frequency . 


2. The total sum of all the frequencies will be denoted by 
N . 


: 


is found out and this will represent 


N + 1 
3. The value of 

4 
the lower quartile . 
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Value of the items 


Frequency 


Cumulative frequency 


25 


3 


3 


35 


4 


7 


45 


2 


9 


55 


5 


14 


14 


- 


- 3.75 


N 14 ( N + 1) / 4 = 15/4 

= Value of 38 th item . 


: . 


All the items beyond third and upto 7th item are having 
values equal to 35 . 


: . 3fth item is 35 . 


III. Calculation of lower quartile from continuous frequency 

distribution 

1. The frequencies should be converted into less than cumu 
lative frequencies. 


2. The total of all the frequencies should be calculated ( N ) 
and this should be divided by 4 ( N / 4 ) to find out įth of the total 
frequency . 


3. We should find out the lower quartile class in which 


l , lies . 


4. The true lower limit of the lower quartile class should be 
determined ( 1 ) . 


5. The frequency of the lower quartile class should also be 
noted ( f ) . 

S. II - 4 
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6. The less than cumulative frequency of the class preceed 
ing the lower quartile class should be noted ( m ). 


7. The class interval of the lower quartile class should be 
noted ( c ). 


Q 


Lower quartile ; 


I 


lower limit of the lower quartile class . 


f = frequency of the lower quartile class. 


m = 


cumulative frequency of the class previous to the first 
quartile class . 


с 


The class interval of the first quartile class . 


The following formula can be used for calculating the lower 
quartile : 


1 + 


( N / 4 — m ) C 

f 


Class 


Frequency 


Cumulative frequency 


05 


3 


3 


5 - 10 


4 


7 


10 -- 15 


2 


9 


15 - 20 


5 


14 


20 – 25 


1 , 


15 


N 


· 15 ; N , 4 


15,4 


3.75 


First quartile class = 5 - 10 . 
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1 == 5 ; f = 4 ; m 3 ; C 5 . 

( 15/4 -3 ) x 5 
l , = 5+ 

4 


0.75 X 5 


= 5+ 


4 


= 5 + 0.94 


5.94 . 


IV . Calculation from the graph 

As in the case of Median , the Quartiles can be compiled from 
the Ogive curve . We should draw the Ogive curve for less than 
the cumulative frequency. The values are plotted on X - axis 
and the cumulative frequencies are plotted on the ‘ Y axis . As 
the first quartile ( Q ) or Lower quartile divides the distribution 
in the ratio 1 : 3 , we should select the point on the Y -axis corres 
ponding to the frequency N / 4 . From this point on the Y -axis, 
a straight line parallel to the X -axis should be drawn to cut the 


15 


: 


: 


3 3 화 


Qi 


FIG , 14 
First Quartile 


Ogive curve . From this point of intersection on the Ogive curve , 
a straight line perpendicular to the X - axis or parallel to Y - axis 
should be drawn to cut the X - axis. This point of intersection 
on the X -axis indicates location of the first quartile. Consequently, 
the value of this point on the X - axis will give the value of the first 
quartile . 
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Property 
1. The first quartile will be expressed in the same unit of 

measurement as the original items in the distribution. 
2. The first quartile is influenced by the position of the items 

in the array and not by the value or size of the items. 


Third Quartile ( Qc)—Upper Quartile 

The upper quartile divides the distribution into two unequal 
parts so that one fourth ( 1/4 ) of the items will have value greater 
than l , and three fourths ( 3/4) of the items will have value less 
than Q. In fact, it is quite opposite to Q. This can be compu 
ted in the same way as the first quartile. 


I. Calculation of upper quartile from ungrouped data 

The following methods are adopted : 
( i ) The given items should be arranged in the ascending order 

of their magnitude. 
( ii ) The total number of items should be found out and it 

should be denoted by n . 
( iii) The following formula may be used : 


Q , = Value of 3 ( n + 1 ) /4th item . 

( or) 3/4 (n + 1 ) th item . 


Let us calculate the upper quartile : 

29, 35 , 26 , 44 , 40 , 55 . 


1 


Let us re - arrange them in the ascending order. 

26 , 29, 35 , 40 , 44 , 55 . 


6. 


i 


The total number of items (n ) 
lg = 3/4 ( n + 1 ) th item . 
3. (n + 1 ) 

3/4 ( 6 + 1) = 
4 


21 


3 x 7 

4 


5.25 . 


4 
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We have to find out the value of the 54 item . 
Value of the 5th item + 4 (Value of the 6th item 

value of the 5th item ). 


Q : 


Value of the 5th item = 


44 . 


L 


Value of the 6th item 55 . 


Difference between the 6th and 5th items 


55—44 = 11 . 


4th difference 11/4 2.75 
:: l Value of the 5th item + 4th difference 

44 +2.75 


: 


= 46.75 


( 1 ) Calculate the upper quartile for the following data 


29 , 35 , 26 , 40 , 58 , 44, 55 . 


First re - arrange them in the ascending order. 
26 , 29, 35 , 40 , 44, 55 , 58 . 

3 ( n + 1) 
n = 7 . lg = value of th item . 

4 


3 ( 7 + 1 ) 


3 x 8 

4 


6th item . 


4 


Value of the 6th item 


55 . 


:: ls 


55 . 


II. Calculation of upper quartile from discrete frequency distri 

bution 

1. For calculation of upper quartile for discrete frequency 
distribution , we should convert the frequency into less than cumu 
lative frequency. 


2. The total sum of all the frequencies may be denoted by 
the letter N . 
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3. The same formula i.e : 3 ( N + 1 ) / 4 can be used . 

We shall calculate the third quartile for the following distri 
bution : 


Value 


Frequency 


25 


3 


35 


4 


:: 


45 


2 


55 


5 


14 


We should first calculate the less than the cumulative fre 
quency 


Value 


Cumulative frequency 


Frequency 

( 2 ) 


( 1 ) 


( 3 ) 


25 


3 


3 


35 


4 


7 


45 


2 


9 


! 


55 


5 


14 


Total frequency N = 14 . 


i 


N+ 1 


14 + 1 = 15 . 


* ( N + 1) x 15 45/4 - 11.25 


55 


It is seen from col( 3) of the above table that all the items 
after 9 and upto 14 are having values 55. As the 111 ( 11.25 ) th 
item lies after 9 , lg is equal to 55 . 


III. Calculation of upper quartile from continuous frequency dis 

tribution 


In the case of continuous frequency distribution , the formula 
adopted is slightly different from the formula adopted for discrete 
frequency distribution . However, 

However, the procedures are same . 
3N instead of 

3 ( N + 1 ) 
The formula adopted is 

4 

4 


Methods followed 
1. The frequencies should be converted into less than cumu 

lative frequency. 


2. The total of the frequencies should be found out and indi 

cated by the letter N . 


3. It should be divided by 4 and multiplied by 3 , or it should 

be multiplied by 3 and then divided by 4. In other words, 
it should be multiplied by t = 3 N / 4 . 


4. We should find out the class in which the lg lies and 

this class is known as upper quartile or third quartile 
class . 


5. The true lower limit of the upper quartile class should be 

ascertained (1) 
6. The frequency of the upper quartile class should be noted 

( f ) . 


7. The less than cumulative frequency of the class preceed 

ing the upper quartile class should be noted (m ). 


8. The class interval of the upper quartile class should be 
noted ( c ) 

( 3N / 4 ~ m ) 
Upper quartile : Q = I + 

f 


XC 
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2 , = Upper quartile ; 1 = Lower limit of the upper quartile class. 


N = total frequency, m = Less than cumulative frequency of the 

class previous to the upper quartile class. 


= Class interval of the upper quartile class. 


f = frequency of the upper quartile class . 


Calculate the upper quartile for the following distribution : 


Class 


Frequency 


Less than cumulative 

frequency 

( 3 ) 


( 1 ) 


( 
2 
) 


0 - 5 

- 


3 


3 . 


5 - 10 


4 


7 


i 


10 – 15 


2 


9 


15 - 20 


5 


14 


. 


20 – 25 


1 


15 


- 


Total 


15 


Total frequency N = 15. 3N / 4 


3 x 15 


45/4 


4 


= 11.25 ( 113) th item . 


es 


Válue of the item 11.25 . 


We know from col. ( 3 ) of the above table that 11 th item 
lies in the class 15 to 20 and this is the upper quartile class. 
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Upper quartile class 


15 - 20 


The true lower limit 
of the upper quartile 
class ( 1) 


15 . 


Frequency of the upper 
quartile class ( f ) 


= . 5 . 


Class interval of the 
upper quartile class 
( c ) 


20 - 15 = 5 . 


1 


The less than cumula 
tive frequency of the 
class preceeding the 
upper quartile class. 


9 . 


.. l , 


= l + 


( 3N ,4 -- m ) XC 

f . 

(45 /4-9) X 5 
= 15 + 

5 

(11.25 -9) X 5 
= 15 + 

5 


2.25 x 5 


= 15 + 


5 


15 + 2.25 


= 17.25 


Upper quartile 


17.25 


IV . Calculation of upper quartile from the graph 

As in the case of Median and Lower quartile the upper 
quartile can also be computed from the Ogive curve . We should 
draw Ogive curve for less than cumulative frequency distributicn . 
The values corresponding to the lower limits of the various classes 
are plotted on the X - axis and the cumulative frequencies are 


0 
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plotted on the Y - axis . As the third quartile devides the 
distribution in the ratio 3 : 1 , we should select the point of location 
on the Y - axis corresponding to 3N 4. From this point on the 
Y - axis, we should draw a straight line parallel to the X -axis cut 
ting the Ogive curve. From this point of intersection on the Ogive 
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FIG . 15 
Third quartile 


curve , we should draw a straight line perpendicular to the X -axis 
or parallel to the Y - axis to cut the X - axis . The point of intersec 
tion on the X - axis will correspond to the value of Q , item . There 
fore , the value of X corresponding to this point of intersection 
on the X - axis relates to the upper quartile or third quartile. 


Properties 
1. The third quartile will be expressed in the same unit of 

measurement as the original values in the distribution . 


2. The third quartile is influenced by the position of the 

items in the array and not by the value or size of the item . 


We have referred to the lower quartile as the first quartile 
, and the upper quartile as the third quartile ( Q ) . Naturally, 
we may be tempted to ask about the missing quartile namely 
the second quartile . The second quartile is nothing but the 
Median ( 22). 
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QUINTILES 


We have stated earlier that a distribution can be divided 
into four equal parts by the first, second and third quartiles. In 
the same way the given distribution can be divided into 5 equal 
parts by the various Quintiles, namely , First, Second , Third , 
Fourth quintiles: 


The various quintiles can be calculated for a given series 
of data as in the case of quartiles. Therefore, the procedures 
are more or less same, the points of location may differ and conse 
quently the formulae have slightly to be altered . 


For ungrouped data and discrete frequency distribution the 
following formulae have to be adopted : 
First Quintile 

( n + 1) 

5 


Second Quintile 


2 ( n + 1) 

5 


Third Quintile 


3 ( n + 1) 

5 


Fourth Quintile 


4 ( n + 1) 

5 


For continuous frequency distribution , the following for 
mülae can be followed : 


First Quintile 


( N 5 - mc 
= lt 

f 


( 2N / 5 – m ) 
Second Quintile = 1 + 


с 


f 


с 


( 3N / 5 – m ) 
Third Quintile = l + 

f 

(4N /5 - mc 
Fourth Quintile = 1 + 

f 
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These different quintiles can be calculated from the graph 
also , by drawing straight lines parallel to X -axis to cut the less 
than cumulative frequency curve . But these straight lines should 
be drawn from the parts on the Y -axis, corresponding to the 
frequency namely , N 5 , 2N / 5 , 3N / 5 and 4N / 5 for the first 
second, third and fourth quintiles respectively. 


Properties 
1. The quintiles will be expressed in the same unit of measure 

ment as the original items in the distribution . 
2. They are influenced by the position of the items and not 

by the value or size of the items. 


DECILES 


We can divide the distribution into 10 equal parts . In this 
process we get 9 dividing points or positions and they are called 
Deciles . 


These nine deciles are generally denoted by the letters Day 
Dg, D , D. , D. , Do, D , D , and Do 


Calculation of deciles for ungrouped data 

The given items should first be arranged in the order of their 
magnitude. The total number of items in the array should be 
indicated by the letter n . Then ( n + 1 ) th item should also be 
found out . Then this should be divided by 10 ( n + 1 ) / 10 . This 
quotient obtained should be multiplied by the number of respective 
deciles as follows : After these, the respective deciles can be 
calculated with the help of the formula noted against each . 


D 


= Value of ( n + 1 ) / 10th item ; 


D 


Value of 2 ( n + 1 ) / 10th item ; 


D , 


Value of 3 ( n + 1 ) / 10th item ; 


D 


Value of 4 ( n + 1 ) / 10th item ; 
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D. 


= Value of 5 ( n + 1 ) / 10 th item ; 


D 


= Value of 6 ( n + 1 ) / 10 th item ; 


DE 


= Value cf 7 ( n + 1 ) / 10 th item ; 


D 


= Value of 8 ( n + 1) / 10 th item ; 


D 


Value of 9 ( n + 1 ) / 10 th item ; 


9 


Let us consider the following items and work out the various 
deciles : 


26 , 15 , 20 , 18 , 25 , 17, 29 , 40 , 35 , 28 , 31 , 42 , 50 , 
53 , 60, 65 . 


These values should first be arranged in the order of their 
magnitude as follows: 

15 , 17 , 18 , 20 , 25 , 26 , 28 , 29 , 31 , 35 , 40 , 42. 50 , 
53 , 60 , 65 . 


The total number of items = 16 . 


.. 


n + 1 = 


17 . 


( n + 1 ) / 10 = 


17,10 


17 


Dx 


S 


1 ( n + 1 ) 

10 


= 1.7th item 


10 


D 


ll 


2 ( n + 1 ) 

10 


2 x 17 

10 


= 3,4th item . 


2 


3 x 17 


D 


3 ( n + 1 ) 

10 


5.1th item . 


10 


De 


11 


4 ( n + 1 ) 

10 


4 x 17 

10 


= 6.8th item 


5 x 17 


Do 


5 ( n + 1 ) 

10 


8.5th item 


10 
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6 X 17 


D. 


6 ( n + 1 ) 

10 


10.2th item . 


11 


11 


10 


7 x 17 


D , 


7 (n +1 ) 

10 


11.9th item . 


$1 


10 


8 ( n + 1 ) 


8 x 17 


De 


13,6th item . 


11 


10 


10 


D. 


9 ( n + 1 ) 

10 


9 x 17 

10 


= 15.3th item . 


Di 


= 1.7th item = First item + 0.7 x difference between 

first and second items. 
= 15 + 0.7 ( 17-15 ) 


15 + 0.7 x2 


. 


= 15 + 1.4 = 16.4 


D. 


= 3.4th item = 3rd item +0.4 x difference between 3rd 

and 4th items. 


= 18 + 0.4 ( 20 


18 ) 


18 + 0.8 


= 18.8 


D 


== 5.1th item = 5th item + 0.1 x difference between 

5th and 6th items. 


25 + 0.1 ( 26 – 25) 


= 25 + 0.1 


= 25.1 


Do 


6.8th item 


6th item +0.8 x difference between 

6th and 7th items. 


= 28 

26 + 0.8 ( 28-26 ) 


- 26. + 1.6 


27.6 
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Da 


= 8.5th item 


8th item + 0.5 X difference between 

8th & 9th items . 


29 + 0.5 ( 31 


29 ) 


29 + 1.0 


30 


D & 


== 10.2th item = 10th item + 0.2 x difference between 

10th & 11th items. 


35 + 0.2 x ( 40 – 35 ) 


35 + 1.0 


36 


DA 


11.9th item = 


11th item +0.9 x difference between 

11th and 12th items. 


= 40 to 0.9 ( 42 


40 ) 


40 of 1.8 


41.8 


D 


= 13.6th item = 13th item +0.6 x difference between 

13th & 14th items, 
50 + 0.6 ( 53 - 50 ) 

50 + 1.8 
= 51.8 


11 


D 


15.3th item = 15th item to 0.3 x difference between 

15th and 16th items. 


= 60 to 0.3 (65 mm 60 ) 

60 + 1.5 


DI 


}} 


61.5 
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Type II 
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Calculation of deciles for discrete frequency distribution : 


1. In the case of discrete frequency distribution , the less 
than cumulative frequency should first be calculated . 


2. The total number of items or the total of all the frequen 
cies should be indicated by N. 


Afterwards the deciles are calculated with the help of the 
following formulae : 


Di 


D 


D 


Value of 1 ( N + 1 ) / 10th item ; 

Value of 2 ( N + 1) , 10th item ; 
= Value of 3 ( N + 1 ) / 10th item ; 

Value of 4 ( N + 1 ) / 10th item : 
= Value of 5 ( N + 1) / 10th item ; 


D. 


D. 


Do 


= Value of 6 ( N + 1 ) / 10th item ; 


D , 


D. 
D , 


Value of 7 ( N + 1 ) / 10th item ; 

Value of 8 ( N + 1 ) / 10th item ; 
= Value of 9 ( N + 1 ) / 10th item . 


We shall consider the following example : 


Value 


Frequency 


Cumulative frequency 


25 


4 


4 


35 


3 


7 


45 


6 


13 


55 


7 


20 


65 


3 


23 


75 


1 


24 


65 


N = 24 ; N + 1 


24 + 1 


25 . 


N + 1 / 10 


D = 1 ( N + 1 ) / 10 


25/10 = 2 : 5 
I (24 + 1), 10th item 
25 / 10th item 


I 


3 


2.5th item . 


The value of 2.5th item 


25 . 


! 


2 x . 25 


D , = 2 ( N + 1 ) / 10 


= 5th item . 


10 


Value of 5th item 


35 . 


Do = 3 ( N + 1 ) / 10 


3 x 25 

th item 
10 


7.5th item , 


Value of the 7.5th item 


45 . 


4 x 25 
D , = 4 (N + 1)/10 = 

10 


th item 


10th item . 


Value of the 10th item = 45 . 


5 x 25 


Do = 5 ( N + 1 ) / 10 


12.5th item 


10 


Value of the 12.5th item 


45 


De = 6 ( N + 1 ) / 10 


S 


6 x 25 

10 


th item 


= 15th item . 


Value of the 15th item = 55 . 


D , = 7 ( N + 1 ) / 10 


7 x 25 

th item . 
10 


17.5 th item 
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Value of the 17.5th item = 55 . 


8 x 25 


D. = 8 ( N + 1 ) / 10 


20th item . 


10 


Value of the 20th item = 55 . 


D , 


9 ( N + 1) 

10 


9 x 25 

th item 
10 


22.5th item . 


Value of 22.5th item = 65 . 


III . Calculation of deciles from continuous frequency distribution 

We can also calculate deciles for continous frequncy distri 
bution as we have calculated Median , Quartiles and Quintiles. 
The only change introduced in the formula is substitution of N / 10 
for N / 2 in the case of median, for N / 4 in the case of first quartile , 
for 3N / 4 in the case of third quartile etc. 


The procedures adopted are as follows: 
1. The cumulative frequencies should be calculated . 


2. The total frequency should be denoted by the letter N . 


. 3. The true lower limit of the class should also be fixed . 


The formulae will emerge as follows : 

(N / 10 — m ).C 
D , = l + 

f 

(2N / 10 m ) 
D , = l + 

f 

(3N / 10 - m ) 
D , = 1 + 

f 

(4N / 10 - m ) 
D = 1 + 

f 

(5N / 10 - m ) 
D = 1 + 

f 


6 : 7 


(6N / 10 - m ) : 
Do = 

f 

( 7N 10 – m ) c 
D , = l + 

f 

( 8N / 10 - mc 
D = 1 + 

f 
(9N / 10 -- m ) 

c 
De = 1 + 

f 


:: 


In these formulae : 


: 


1 


C 


f 


lower limit of the respective decile class . 
class interval of the respective decile class. 
frequency of the respective decile class . 
cumulative frequency of the class preceeding 
the respective decile class . 


5 


m 


We shall examine this in detail with the help of an examplo : 


Class 


Frequency 

(2 ) 


Cumulative frequency 

( 3 ) 


( 1 ) 


0 - 5 


. 3 


1 


3 


5 - 10 


4 


7 


10 – 15 


7 


14 


15 - 20 


8 . 


22 


20 - 25 


2 


24 


25 - 30 


1 


25 


Total 


25 


1 
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From the frequencies given in col ( 2 ) we should calculate 
the cumulative frequency as given in col ( 3 ) of the table. 


Total frequency ( N ) 


= 25 . 


:: N / 10 

N / 10 = 25/10 = 2.5 


D , relates to N / 10th item . 

( N / 10 — m ) 
D , = l + 

f 
( 25/10 - 0 ) X 5 
= 0 + 

3 


25 


5 


X 


olim 


4.17 


10 


D , relates to 2N 10th item . 


2 x 25 


2N / 10 


5th item . 


10 


5th item lies in the class 5 - 10 . 


D , = l + 

( 2N / 10 — m ) 

f 
( 2 x 25 

3 ) X 5 

10 
= 5+ 

4 


= 5+ 


(5 - 3 ) x 5 

4 


= 5 + 2.5 = 


7.5 


3 x 25 
D = 3N / 10 = 

= 7.5 
10 


De class 


= ( 10 - 15 ) 

( 7.5 - 7 ) x 5 
: 10 + 

7 


5 . 


10 + 1 x 5/7 = 10 


14 
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4 x 25 
D. = 4N / 10 = 

10 


10 . 


D. class = 10 


15 


( 107 ) x 5 
- 10 + 

7 


3 x 5 
= 10 + 

7 


127. 


5 x 25 


DE = 5N / 10 


12.5 


10 


D Decile class 


= 10 


10 – 15 . 


( 12.5 - 7 ) x 5 
D. = 10 + 

7 


5.5 x 5 


10 + 


7 


27.5 


= 10 + 


7 


10 + 3.93 


13.93 . 


6 x 25 


Do = 


6N / 10 


= 


15 . 


10 


Decile class 


mm 15 - 20 . 


D. 


15 


+ 


( 15 --- 14 ) X 5 

8 


= 15 + 5 / 8 = 15.63 


7 x 25 


D , = 7N /10 


= 17.5 


10 


Decile class 


= 15 - 20 . 


70 


D , = 15 


(17.5-14 ), X 5 

8 


17,5 


15 + 


3.5 X 5 

8 


15 + 


8 


17.2 


8 x 25 


D. 


8N /10 


= 20 . 


10 


Decile class 


i 


15 


20. S 


DE 


15 + 


( 2014 ) X 5 

8 


6 x 5 


15 # 


18.75 


8 


9 x 25 


D , 


9N / 10 

/ 


22.5 


10 


Decile class 


20 - 25 . 


D , 


the 20 


+ 


( 22.5 -22 ) x 5 

2 : 1 


20 + 


(0.5 X 5 ) 

2 


انم 


vi 


* 21.25 


Calculation from the graph 

Deciles can be calculated from the Ogive curve for the less 
than cumulative frequency distribution . Corresponding to the 
items N / 10 . 2N 10 , 3N 10 , 4N 10 , 5N 10 , 6N / 10 , 7N / 10 , 8N / 10 
and 9N / 10 , points on the Y -axis should be located . From these 
points , we should draw straight lines parallel to X -axis to cut 
the Ogive curve at different points called P1 , Pg , Pg Pa, Psi bo , Pq 
P. and Po 

Fro n these points on the Ogive curve , we should 
draw straight lines perpendicular to X - axis to cut the ! :X - aixis at 


> 
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9 different points denoted by the letters xq , xg , Xg , X. , Xg ; Xg , Xiny 
X , and xg . The values of X - axis at these 9 points will indicate 
the respective decile value . 


PERCENTILES 


Another measure of location of central tendency is Percentile. 
These points of location divide the distribution or series into 
100 equal parts and as such there are 99 points of location. They 
are called Percentites and denoted by the letter P with the suffix 
correspending to the serial number of the percentile ( eg . ) : Px Pay 
P82 .... Par 


Pog 


1. Calculation of Percentiles from grouped data 


The methods are similar to those adopted in the case of 
quartiles, quintiles and deciles. The only difference is that the 
value of (n + 1 ) should be divided by 100. The values of the 
different percentiles would relate to the value of the items menti 
oned against each given below : 


P2 


= Value of ( n + 1 ) /100th item 


P , 


= Value of 2 ( n + 1 ) / 100 


>> 


P. 


= Value of 3 ( n + 1 ), 100 


?? 
, 


P. 


Value of 5 (n + 1)/ 100 ; 


وو 


Pio 


Value of 10 ( n + 1 ) / 100 


P 


Value of 25 ( n + 1 ) / 100 , 


25 


Pво 


Value of 50 ( n + 1 ) / 100 


P 


Value of 99 ( n + 1 ) /100 


99 


The general formula can be written as follows: 


r(n + 1)th item 


P 


Value of 


100 


1 
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Where øy ? can take any value between 1 and 99. n , represents 
the total number of items in the series. Let us examine the follow 
ing item and calculate 15th , 30th and 65th percentiles. 


21 , 28 , 34 , 15 , 30 , 70 , 65 , 48 , 51, 75 . 


as 


Let us re - arrange these values in the ascending order 
follows: 


15 , 21 , 28 , 30 , 34 , 48 , 51 , 65 , 70 , 75 ; 


The total number of items 


n 


10 . 


nt 1 


= 10 + 1 = 11 . 


( 1 ) PL 


15 X ( 10 + 1 ) 

100 


th item . 


16 


u 


15 x 11 

100 


165/100 = 1.65th item . 


Value of the 1.65th item Value of the 1st item + 

0.65 X difference of the 1st and 2nd items. 


= 15 + 0.65 ( 21 - 15 ) 


= 15 + 0.65 X6 


15 + 3.90 


18.90 


( 2 ) P30 


= 


30 X ( 10 + 1 ) 

100 


th item , 


= 


30 X 11 

100 


== 3.3th item 


Value of 3.3th item = Value of 3rd item + 0.3 x difference 

between 3rd & 4th items . 
= 28 + 0.3 ( 30 28 ) 

28 + 0.3 x 2 


Zone 28.6 
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( 3) Px = 65 ( 10 + 1 ) / 100 + 7.15 


The value of the 7.15th item 


= Value of the 7th item +0.15 X 

difference between 7th & 8th items. 


51 + 0.15 ( 65 


51 ) 


= 51 + 2.10 


= 53.10 


II . Calculation of percentiles from discrete frequency distribution 

The given frequencies should be converted into less than 
cumulative frequencies. Sum of the total frequencies should 
be calculted . 


The percentiles should be calculated as follows: 

Value of the N + 1 / 100th item . 

Value of the 25 ( N + 1 ) / 100th item . 
P Value of the 99 ( N + 1 ) / 100th item . 


P. 
P. 


25 


99 


Example 


Value of the item 


Frequency 


Cumulative frequency 


25 


4 


4 


35 


6 


10 


45 


5 


15 


55 


7 


22 


1 


65 


3 


25 


75 


2 


27 


85 


2 


29 
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N = 29. N + 1 


= 30 . 


.. N + 1/100 


30/100 


3/10 


Let us calculate first the less than cumulative frequency as 
given in col. 3 . 

5 ( 29 + 1 ) 5 x 30 
P. 

= 1.5th item 
100 

100 


Value of the 1.5 th item = 25 . 


P 20 


11 


6th item . 


20 (29 +1 ) 20 x 30 
100 

100 
Value of the 6th item = 35 . 


Р. 


44 ( 29 + 1 ) 

100 


44 x 30 

100 


= 13.2th item . 


14 


Value of the 137th item = 45 . 


III. Calculation of Percentiles from contiguous frequency distri 

bution 


The true lower limits of the classes should be calculated. 


The sum of the total frequencies should be arrived at. The 

less than cumulative frequency should be calculated. 
Let us denote P N / 100th item . 
Let us calculate P16 , Pga , P., th from the following table. 


Class 


Frequency 


Cumulative frequency 


05 
5 - 10 
10 - 15 
15 - 20 
20 - 25 
25 - 30 
30 - 35 


4 
6 
5 
7 
3 
2 
3 


4 
10 
15 
22 
25 
27 
30 


wo 


Total 


30 


75 


N = 


30 ; 


. : . N / 100 


30/100 


3/10 . 


P. 


15 x 30 

100 


: 4.5 


16 


Pith class 


= 55 - 10 


P 16 


It 


(15N / 100 — m ) Xc 

f 


15 


( 4.5 - 4 ) x 5 
= 5 + 

6 


5 % 


5 + 1 X 5/6 
22 ( 30,100 ) 


P 


= 6.6 


22 


This lies in the class 5-10 


Paa 


= 5 + 


( 6.6 --- 4 ) X 5 

6 


2.6 x 5 


= 5 + 


5+ 13/6 
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6 


Pro 


70 x 30 

100 


21 


70 


21st item lies in the class 15--20 . 


P 70 


( 21 – 15 ) x 5 
15 + 

7 


70 


6 x 5 
= 15 + 

7 


= 15 to 4i 


i 


194 


IV. Calculation from the graph 


Percentiles can also be computed from the Ogive curves for 
the continuous frequency distribution : 
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Corresponding to the items N /100, 2N /100 , 3N / 100 .... 7N /100 
... 99N / 100 , points on Y - axis can be located . From these points 
on the Y - axis , we should draw straight lines parallel to the X 
axis to cut the Ogive curve at different points say P1 , P2 , Pg , Par 
...... Pr ...... P 

... Pog . From these points of intersections on the 
Ogive curve , we should draw straight lines perpendicular to 
X -axis or paralllel to Y - axis to cut the X - axis ac different points 
say X , X , ..X . X The values of X corres 
ponding to these points of location on the X axis will give the 
values of the respective percentiles. 
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1 


Important points to be noted for calculating Median , Quar 
tiles, Quintiles , Deciles and Percentiles from continuous frequency 
distribution : 


Sometimes, the mid -values or class mark of the each class 
will be given instead of the classes with their true lower and upper 
limits . In such cases , the difference between two successive class 
marks can be taken as the class interval. With the help of these 
class intervals and the respective mid -values or the class marks, 
the true lower and upper limits of the classes should be fixed . 


From the frequency , we should calculate the cumulative 
frequency of the distribution . 


Properties 


1 . 


The values of Medians, Quartiles, Quintiles , Deciles , 
percentiles should always be expressed in the same 
unit of measurement as the original units in the 
distribution . 


2 . 


These values are influenced by the location or posi 
tion of the items and not influenced by the magnitude 
of values of the items. 
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LIST OF FORMULAE FOR CALCULATION OF 

MEASURES OF CENTRAL TENDENCY 


Measuresments Ungrouped Discrete 

data 

frequency 
distribu 

tion 
( 1 ) ( ) 

( 3 ) 


Continuous 
frequency 
distribu 
tion . 

( 4 ) 


1. Mean 


Σ xjn 


ΣxfiN 


xif 
Σ 

N 
(where xi is the mid 

value ). 


2 . 


Median 


( n + 1 ) / 2th 


( N + 1 ) / 2th N /2th item 

( N / 2 - mc 
item 


item 


1+ 


f 


3 . 


Quartiles Q , ( n + 1 ) / 4th ( N + 1 ) / 4th 

item item = 1 + 


( N / 4 m ) C 

f 


ga 


3 ( n + 1) th 3 ( N + 1 ) th 

4 item 4 item 


( 3N / 4 - mc 
= 1 + 

f 


4 . 


Quintiles 1st ( n + 1 ) / 5 


( N + 1) / 5th 

( N / 5 - mc 
item = l + 

f 
2 ( N + 1)/ 5th 

(2N / 5 - m ) 
item = l + 


2nd 2 ( n + 1 ) / 5 


f 


3rd 3 ( n + 1 ) / 5 3 ( N + 1 ), 5th 

item = l + 


( 3N / 5 —- m ) c 

f 


4th 4 ( n + 1 ) / 5 


4 ( N + 1 ) 5th 


item = l + 


(4N;5 — m ) C 

f 
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( 1 ) 


( 2 ) 


( 3 ) 


( 
4 
) 


1 


( N , 10 - m ) c 


5 
. 


Deciles: D. (n + 1) /10 

: D n ) 


( N + 1 ) / 10 = 1 + 


f 


2 ( n + 1 ) / 10 


2 ( N + 1 ) / 10 


D 


( n + 1 ) / 5 


1 
) 


(2N , 10 - m ) 

f 


( N + 1) / 5 = 1 + 


2 


; 


D 


3 ( n + 1 ) / 10 


3 ( N + 1 ) / 10 = 1 + 


( 3N / 10 - m ) 

f 


D. 


4 ( n + 1 ) / 10 


4 ( N + 1 ) / 10 = 1 + 


( 4N / 10- mc 

f 


2 ( n + 1 ) / 5 


2 ( N + 1 ) / 5 


C 


D. 


S 


5 ( n + 1 ) 

10 


5 ( N + 1 ) 

10 


1 + (5N / 10 - m ) 

f 
(Median ) 


6 ( n + 1 ) 

10 


De 


6 ( N + 1) 

10 


1+ 


(6N / 10 - m ) 


f 


Da 


= 


7 ( n + 1 ) 

10 


7 ( N + 1 ) 

10 


( 7N / 10 – mộc 
1+ 

f 


D. 


8 ( n + 1 ) 

10 


8 ( N + 1) 

10 


It 


(8N /10 - mc 

f 


4 ( n + 1 ) 

5 


4 ( N + 1 ) 

5 


D. 


9 ( n + 1 ) 

10 


9 ( N + 1 ) 

10 


1+ 

(9N / 10 - mc 


f 
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( 1 ) 


( 2 ) 


( 
3 
) 


( 4 ) 


6. Percentiles 


P , 


( N 100 


( n + 1 ) / 100 


m ) C 


1 + 


( N + 1 ) / 100 


f 


PS 


5 ( n + 1 ) / 100 


5 ( N + 1 ) / 100 1+ 


(5N, 100 – m ) C 


f 


P10 


10 ( n + 1 ) / 100 


( 10N / 10 – m ) C 

f 


10 


10 ( N + 1 ), 100 1 + 


P20 


20 (n + 1), 100 20 ( N + 1 ), 100 1 + 


( 20N 100 - m ) c 

, 

f 


P = l , 25 (n +-1 ), 100 


25 ( N + 1 ), 100 1 + 


( 25N 100 -m ) c 

f 


P = 


30 ( n + 1 ) / 100 


30 ( N + 1 ) / 100 1 + 


( 30N / 100 -- m ) C 

f 


Pe 


40 (n + 1 ), 100 40 ( N + 1 ), 100 1+ 


(40N , 100 – m ) C 

f 


10 


P80 = l , 50 (n + 1)/ 100 50 ( N + 1) / 100 1 + 


(50N 100 -m ) c 

f 


PS 


60 ( n + 1 ) / 100 60 ( N + 1 ) / 100 1 + 


m ) c 


(60N 100 

f 


PO 


c 
70 (n + 1) / 100 70 ( N + 1 ), 100 1 + (70N / 100 – m ) 

f 


70 


Po = 2 , 75 ( N + 1) / 100 


75 ( N + 1 ) / 100 1 + 


( 75N 100 m ) 

f 


80 


( 1 ) 


( 2 ) 


( 3 ) 


(4) 


P 


80 (n + 1 )/ 100 80 ( N + 1 )/ 100 1 + 


80 


(80N / 100 - m ) c 

f 
(85N / 100 – m ) C 


P86 


85 ( n + 1 ) / 100 85 ( N + 1), 100 1+ 


f 


P. 


90 (n + 1)/ 100 90 ( N + 1 ) /100 1 + 


90 


( 90N / 100 m ) c 

f 
( 99N , 100 — m ) c 

f 


PO 


99 (n + 1 ) / 100 99 ( N + 1 ) / 100 1 + 


99 


1 


MODE 


Another measure of location or central tendency is Mode . 
Mode is defined as that value which occurs most frequently or 
typical. Generally Mode represents the value having the largest 
or the maximum frequency. Mode can be calculated for un 
grouped data , for discrete frequency distribution and continuous 
distribution . 


1. Calculation of mode from ungrouped data 

Mode can be calculated with ease in the case of ungrouped 
data . The first step in this process is the re - arrangement of the 
values in the series in the ascending order of their magnitude. From 
the series thus arranged , that value which occurs most frequently 
or that value which occurs the greatest or highest or maximum 
number of times can be selected as the Mode . Let us 
example : 

7 , 8 , 10 , 10 , 10 , 11 , 12 , 12 , 25 , 25 , 29 . 


see an 


It is seen certain values for example : 10 , 12 and 25 are 
occuring more than once. While the values 12 and 25 are occur 
ing twice , the value 10 is occuring thrice . Hence the Mode or 
Modal value for this series is 10 . 
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In the case of Mode , a change in the value of one item can 
naturally alter the value of Mode . Suppose we replace 10 by 
25 in the above series, the series will undergo changes as follows: 

7 , 8 , 10 , 10 , 11 , 12 , 12 , 25 , 25 , 25 , 29 . 
Mode in this series--25 . 


Because of this change in the Modal value due to slight altera 
tion in the value of even one item , it is said that the Modal value 
is highly unstable or nct steady. 


Unimodal 


A series may have only one Modal value and it is called 
Unimodal series. 


Example : 7 , 8 , 10 , 10 , 10 , 11 , 12 , 12 . 

Mode 10 . 


Bi-modal 


Sometimes a series may have two values as modal values and 
it is said to be a bi - modal set or series. 


Example : 7 , 8 , 10 , 10 , 10 , 11 , 12 , 12 , 12 . 


1 


In this series, two values namely , 10 and 12 are occuring 
thrice each . Hence this series is having two Modal values and 
the Modal values are 10 and 12 . 


Trimodal 


A series which has three Modal values is called tri-modal 
series, 7 , 8 , 10 , 10 , 11 , 11 , 12 , 12 . 


In this, three values , namely 10 , 11 and 12 are occurring each 
two times. Hence it is a tri-modal series and the Modal values 
are. 10 , 11 , and 12 . 


! 


Multi -modal 


A series which has more than three Modal values is called 
a multi -modal series: 

S. 11—6 
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2. Calculation of Mode for discrete frequency distribution 

In a discrete frequency distribution , the value of the items 
having the highest or greatest or maximum frequency is taken as 
the mode . 


1 


Example 


Valúe of the items 


Frequency 


( x ) 


8 . 


4 


10 


3 


12 


. 


2 


14 


7 


16 


2 


18 


1 


-4 


20 


1 .... 


In this case the value 14 has the greatest frequency namely 7 . 


Hence Mode 


14 . 


3. Calculation of Mode for continuous frequency distribution 


A. 


Crude method 


1. The maximum frequency of the distribution should be 

found out first . 


2. The class having the highest frequency should then be 

determined. The class having the highest or maximum 
frequency is known as Modal class . 
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3. The mid - value ( xi) of the Modal class should be found 

out by taking the mid - values of the lower and upper limits 
of the class intervals. This mid - value is taken as Mode . 


The assumption is only an approximation and not actual. 
Hence the Mode obtained by this method is only an approximate 
value and not an accurate value . 


Example ( 1 ) 


Weight in kg . 

( x ) 


Frequency 

( f ) 


40 - 50 


4 


50 - 60 


6 


60 - 70 


7 


70 - 80 


12 


80 – 90 


4 


90 - 100 


6 


100 - 110 


8 


Maximum frequency 


12 . 


Modal class 


70 – 80 . 


70 + 80 


Mid - value of the 

modal class 


13 


2 


= 

150/2 = 75 kg . 


Mode 


75 kg . 
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Example (2 ) 


1 


Life in hours 


No. of tube lights 

( f ) 


( 
x 
) 


400 – 700 


7 


700 - 1000 


10 


1000 -- 1300 


4 


1300 - 1600 : 


14 


1600 - 1900 


6 


1900 – 2200 


4 


2200 - 2500 


2 


Highest frequency 


= 14 . 


Modal class 


1300 — 1600 


1300 + 1600 


Mid - value 


2900 

2 


1450 . 


2 


Mode 


1450 hours 


B. Calculation of Mode by giving weights to the preceeding class 

and succeeding class of the modal class 

The value of the mode is sometimes affected or influenced 
by the frequencies in the preceeding and succeeding classes of 
the modal class. 


If the frequency of the preceeding class is greater than the 
frequency of the succeeding class, the value of the Modal class 
will be nearer to the lower limit of the Modal class instead of 
concentrating on the mid - value of the Modal class . 
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On the other hand, if the frequency of the succeeding class 
is greater than the frequency of the preceeding class then the 
value of the Mode will be nearer to the upper limit of the Modal 
class instead of concentrating on the Mid - value of the modal 
class. 


: 


Therefore , in calculating the value of the Mode, the 
frequencies of the preceeding and succeeding classes are also taken 
into consideration . 


1 


:: 


Example ( 1 ) 


Weight in kg . 

( x ) 


No. of bundles 

( f ) 


40 – 


50 


4 . 


50 - 60 


6 


60 


70 


7 


12 - 


70 – 80 
80- 90 


4 


90 - 100 


6 


100 - 110 


8 


The modal class is 70-80 since it is having the greatest 
frequency namely 12 . 


Lower limit of the modal class 


= 


1 = 70 . 


. 


1 


: 


Preceeding class 60 - 70 . 
Frequency of the preceeding class = 7 . 
Let us denote it by the lett . r fi : .. 
J1 

= 7 . 
Succeeding class 80-90 . 
Frequency cf the succeeding class = 4 . 
Let it be denoted by the letter f . i.fi : 4. , 
Width of the Modal class = 80 - 70 

80—70 = 10 . 
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The following formula is adopted for calculation of Mode . 

cf. 
Mode = l + 

fi + fi 


Let us substitute the value in the formula . 

10 X 4 
Mode 

7 +4 . 


70 + 


: 


} 


40 


70 + 


7 
70 + 3 . 

11 


Ź 
73__ 

11 


or 73.64 kg. 


11 


Example ( 2 ) 


; 


Life in hours 

( x ) 


No. of tube lights 

( f ) 


400 – 700 


7 


700 - 1000 


10 


1000 — 1300 


4 : 


1300 - 1600 


14 


1600 — 1900 


6 


1900 - 2200 


4 


2200 - 2500 


2 


Maximum frequency 


14 . 


Modal class 


1300 


1600 . 


1 – true lower limit of the modal class = 1300 . 


€ - width of the modal class 


1300 


1600 
300 . 
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si - frequency of the preceeding class = 4 . 
f , — frequency of the succeding class = 6 . 


! 


c . f 


300 x 6 


Mode = 1 + 


= 1300 + 


Ji +1 , 


4 + 6 


1300 + 1800 / 10 


1300 + 180 


. 


= 1480 Hours . 


C. 


Calculation of Mode by taking the differences in the frequen 
cies between the Modal class and the preceeding class : 2 ) 
Modal class and the succeeding class 


The above formula undergoes a slight change as follows: 


c.d 


Mode = l + 


1 


d , + d , 


; 


1 


True lower limit of the Modal class . 


Width or class interval of the Moda Iclass . 


d , = difference between the frequencies of Modal class and 

the preceeding class . (Only absolute value without 

sign is considered .) 14 - 4 = 10 . 
d , = difference between the frequencies of modal 

class and succeeding class . 14 - 6 = 8 . 
Substituting these values in the formula we get , 
Mode = 1 + 

c.d , 
d , td , 

300 x 10 . 
Mode 1300 + 

10 + 8 
= 1300 + 3000 / 18 
= 1300 + 166.67 

1466,67 Hours. 


1 
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Note : Of these two formulae the second formula gives more 
accurate result and hence it is preferable : 


:) :) 


cxf, 


1. Mode : It 


Li Xv8 


c . d , 
2. Mode : 1 t. 

d , + d , 
A slight change in the second term of the formula may be 
noted : 


( 1 ) 


f 
fi + fi 


( 2 ) 


: 


ef 


c . d , 
d .td . 


In the first case , the frequency of the modal class is not at 
all considered while in the second case the frequency of the modal 
class is also considered indirectly by calculating the difference of 
the frequencies of the neighbouring classes from the frequency of 
the Modal class . Hence the second formula gives better results . 


( 1 ) Actual frequencies in the preceeding and succeeding 
classes are used in the first formula . In the second formula , the 
differences in the frequ ? ncies compar.d with the frequency of the 
Modal class are used . 


( 2 ) In the numerator of the first formula the actual frequency 
cf the succeeding class is used . But in the case of the second 
formula , the difference of the frequency of the preceediog class is 


used . 


3. One practical difficulty may arise in using these two for 
mulae. Sometimes the Modal class may happen to be either 
first or last class. In such rare situations we cannot have either 
preceeding or succeeding class. In all such rare cases, it is better 
to take the mid - value of the Modal class itself as the Mode . 


D. Calculation of Mode from Mean and Median 


Mode can also be calculated from the two other measures 
of central tendency namely, Mean and Median . In symmetric 
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distributions, the frequencies on either side of the Mean will be 
nore or less identical. An example of one such distribution is 
given below : 


( 
x 
) 


( f ) 


0.- 5 


2 


5 - 10 


3 


10 - 15 


4 


15 - 20 


10 


20 - 25 


4 


25 - 30 


3 


30 - 35 


2 


In the above distribution , the class interval is uniform . The 
maximum frequency is 10 in the class 15– 20. On either side 
of this class, the frequencies are identical. This is called a symme 
trical distribution . 


In such symmetrical distributions, Mean and Median will 
have the same value. In other words, Mean and Median will 
try to approach each other and consequently their difference will 
be very small. Hence the difference between Mean and Median 
can be taken as a measure for ascertaining the symmetry. But 
the situation is different in the case of Mode . The difference 
between Mean and Mode will be greater than that of 
Mean and Median . The difference between Mean and 
Mode (Mean - Mode) is compared with the difference between 
Mean and Median (Mean - Median ). It is computed that the 
difference between Mean and Mode (Mean -- Mode) is equal to 
thirice the difference between Mean and Median, 

that is , 
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3 (Mean--Median ). Hence this is used as a formula to determine 
the value of mode . 


Mean - Mode 


3 (Mean 


Median ) 


... Mode 


3 Median 


2 Mean . 


Example : Calculate the Mode 


( 1 ) 

Mean = 125 : Median = 115 . 
(Mean - Mode) 

3 (Mean Median ) 
( 125 - Mode) = 3 ( 125 115 ) 

3 x 10 = 30 . 


.. 


Mode 


: 95 


( 2 ) 1 Mode 


11 


3 Median -- 2 Mean . 


11 


3 X 115 – 2 x 125 


345 — 250 


95 . 


E. Determination of Mode from graph 

Mode can also be determined from graph . In this connec 
tion , the students should note one basic difference. While the 
other measures like Median , Quartiles, Quintiles, Deciles and 
Percentiles are calculated from the cumulative frequencies, the 
Mode is calculated from the actual frequency only . Since cumu 
lative frequencies are used for the calculation of other measures 
namely, Median , Quartiles, Quintiles, Deciles and Percentiles 
the Ogive curve for less than cumulative frequencies is used for 
calculation . As - Mode has to be calculated from the 
actual frequency, the curve for the frequency, namely , the 
frequency curve has to be used for Mode. 


In a frequency curve , the X -axis will represent the value and 
the Y -axis will represent the frequency . In other words, X -co 
ordinate will represent the value of items and the Y co -ordinate 
will represent the frequency . : We should draw different Y ? 
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co - ordinates for different X co - ordinates. That value of X co 
ordinate for which the Y co - ordinate is the maximuum is taken as 
the Mode . 


Through the apex ( highest point) of the frequency curve , 
a vertical line perpendicular to the X - axis to cut the X - axis should 
be drawn. The point of intersection of the vertical line with the 
X -axis may be noted . The value of X , corresponding to this point 
of intersection will represent the Mode . 


Ungrouped data 


Discrete frequency Continuous frequency 
distribution 

distributic.n 


1. Values have to be 

rearranged in the 
ascending order. 


Values of the items 1. Mid value of the 
have to be arranged modal class . 
in ascending order. 


2. Value of the Value of the items 

items in the array having the highest 1. lot 
which occurs for frequency . 
greatest number 
of times . 

2. 1. + 


cif 
fitf , 


c.d , 


d , td , 


3. Mean– Mode 

3 (Mean -Median ) 


4. 1 Made = 3Median 

2 Mean . 


1 


GEOMETRIC MEAN (G.M.) 


We have already studied the Arithmetic Mean . In this 
section we shall study another mean called Geometric Mean . 


While the arithmetic mean is calculated from the total sum 
of the values of the items, the Geometric Mean is calculated from 
the product of the values of the items, 
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While the arithmetic mean is calculated from the sum by 
dividing it by the number of items , the . Geometric Mean is calcu 
lated from the product by finding the root corresponding to the 
number of items. 


A. Calculation of Geometric Mean for ungrouped data 

1. Let us calculate the Geometric Mean for the following 

items 9 and 16 . 


Number of items = 


2 . 


Product of the items = 9 X 16 = 144 . 

Since there are only two items, we should find the square 
root of the product 2 v 144 12 . 

Geometric Mean 12 . 


2. Calculate the Geometric. Mean for the following items; 


4 , 16 and 8 . 


Number of items = 3 . 


Product ofthe items 


4 x 16 x 8 


512 . 


Geometric Mean 


= 


3 v 512 


3 
. 


General Formula 


The general formula can be derived : 


Suppose there are n items, say 


XX , X , X , 


. * . 


G.M. 


x , xx , xx , xx 

, 


.do 


( or ) G == 


Ñ 


x , xx, xxx xx xь 


... X 


4 


에 


Shortcut Method 


1 


1. If there are more items, finding the product by multiplication 
is more tedious. Finding the nth root of the product is still more 
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laborious. Hence we have to find out shortcut method . 
The shortcut method for multiplication is use of logarithm . 
Therefore, we can use logarithms for finding the Geometric 
Mean , 


1 


Those who are not familiar with logarithms may follow the 
illustration given below : 


axal 


a3 + 4 or a ? and not 23X4 or 412 . 


This is because a3 and at can be written as follows: 


a3 = axaxa and at = axaxaxa . 


. :: a3 xaº = ( a xaxa) x (axaxaxa )) 


4344 = a ? 


Here the figures are expressed in terms of powers of a . 


Let us substitute the vaiue 10 for a . 


a3 
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10 x 10 x 10 


1000 . 


a4 


104 


10 x 10 x 10 x 10 


10,000 


So , 1000 x 10,000 can be written as 10 % x 104 = 10344 
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The powers are called logarithms. 


:: Logi ( 1000 x 10,000 ) 


log 10. 1000 + log10 10,000 
= 3 + 4 7 


From thïs, it can be seen that the powers of the product of 
any two numbers will be equal to the sum of the powers of the 
numbers when the power is expressed in terms of a common value 
as base . The same principle applies to the product of any 
number of Numbers . From the power of the product we can 
find out the product itself. The power is called the logarithm 
and the product is called anti- logarithm . The common terni 
in which the power is expressed is called the base . 


F 
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In order to find out the product of different numbers , we 
use the following procedure : 

1. We first find out the logarithm of each number. 
2. After finding out the logarithms, we should add and find 

out the sum of the logarithms. 
3. We find the anti- logarithm for the sum of the logarithms 

and the antilogarithm would represent the product. 


i 


The same procedure can be followed to find out the Geo 
metric Mean . 


1. Find out the logarithm of each value. 
2. Add the logarithms and find the sum . 
3. Divide the sum of the logarithms by the number of items 

in the series. Find the Arithmetic Mean of the logari 
thms. This Arithmetic Mean of the logarithms would 

represent the logarithm of Geometric Mean . ) 
4. Lastly, find out the anti- logarithm for the Arithmetic 

Mean of the logarithms which will be equal to the G.M. 


Formula 


... Xn 


The formula can be derived as follows : 

Let there be n values of x and they may be represented 
by X , X , X , 
Sum of the logarithms = log x , + log x , + log x , + .... + log xo 

Xn (or ) = { log x 
Arithmetic Mean of the log x , + log x , + log xg + . log xn 
logarithms 

Σ log x : 


N 


n 


! 


Σ log και 


is 


log (G.M.) 


i 


n 


Geometric Mean = Antilog of 


Σ log x 


N 
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Let us calculate the Geometric Mean of the following items: 
1225 , 148 , 79 , 1478 and 9 . 

log 1225 = 3.0881 
log 148 

= 2.1703 
log 79 

1.8976 
log 1478 

= 3.1697 
log 9 

0.9542 


Total = 


11.2799 


n = 5 . 


Mean 


11.2799 

5 


= 2.2559 or 2.2560 . 


Antilog of 2.2560 


180.3 


.. Geometric Mean = 180.3 


B. Calculation of Geometric Mean from Discrete frequency 

distribution 


Let us calculate the Geometric Mean of the following values : 


Value 


Frequency 


3 X1 


2 


J 


4 x 


3 


fi 


r 


6 X 8 


1 


fs 


! 


8 X4 


4 fi 


Total 


10 


This can be re -written as follows: 


3 , 3 , 4 4 , 4 , 6 , 8 , 8 , 8 , 8 . 


96 


( 1 ) 


( 2 ) 


log 3 


0.4771 


log 3 


11 


0.4771 


1 


log 4 


0.6021 


log 4 


0.6021 


log 4 


0.6021 


log 6 


S 


0.7782 


log 8 


-- 


0.9031 


log 8 


0.9031 


log 8 


0.9031 


Il 


log 8 


0.9031 


Total 


7.1511 


We can adopt still shorter method in totalling the logarithms. 


The value 3 is repeated two times. Therefore instead of 
writing the log ( 3 ) , twice , we can multiply the log ( 3 ) by 2. In 
the same manner we can multiply log ( 4 ) by 3 , log ( 6 ) by 1 and 
log (8 ) by 4 and calculate the total of logarithms as follows : 

2 x log 3 : = 2 x 0.4771 0.9542 
3 x log 4 = 3 x 0.6021. = 1.8063 
1 x log 6 = 1 x 0.7782 0.7782 
4 x log 8 = 4 * 0.9031 = 3.6124 , 


7.1511 
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After finding the total, we can find the average of the logari 
thms . 


Since there are 10 items, we can divide the total by 10 . 


7.1511 

10 


0.7151 


Log ( G.M. ) 


= 0.7151 


G.M. 


antilog of 0.7151 


= 


5.189 . 


... The formula can be written as follows : 


2 x log 3 + 3 x log 4 + 1 x log 6 + 4 x log 8 . 

2 + 3 + 1 + 4 


If we replace the values by xq , xg , xg and x , and the frequen 
cies are replaced by fi fa, f s and f the formula would 
emerge as follows: 


Log ( G.M. ) 


fix log x , + f , log x , + f , log xg + f . log x 

fi + f , tfs tf . 


.. 


If there are n items the general formula would be , 


log G = 


fi, log x, + f , log x, + ........ + fo log xa 
fi + f , + f3 + . 

tfn 


E fi log xi 

Σ f 


Geometric Mean 


... 


Anti log 


Ef log x 


n 


S. II - 7 
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Therefore the previous example can be worked out as follows : 


Value 


Log x 


f Log x 


Frequency 

( f ) 


( x ) 


3 


2 


0.4771 


0.9542 


4 


3 


0.6021 


1.8063 


6 


1 


0.7782 


0.7782 


8 


4 


0.9031 


3.6124 


Total 


10 


7.1511 


7 : 1511 


Average of log 


10 


0.7151 


Log . of Geometric 

Mean 


0.7151 


G.M. 


= anti log ( 0-7151) 

5.189 


C. Calculation of Geometric Mean from continuous 

distribution 


frequency 


We shall consider the following frequency distribution . 


Class interval 


Frequency 


0 10 


4 


10 - 20 


8 


20 - 30 


10 


30 – 40 


5 


1 


40 - 50 


3 


--- 


Total 


30 
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1. We should first find out the mid - value of each class . 
Afterwards the proc dure will be similar to the method adopted 
in the case of discrete frequency distribution. The distribution 
can be written as follows : 


Mid values 


log ( x ) 


fx log ( x ) 


( x ;) 
( 1 ) 


Frequency 

( f ) 
( 2 ) 


( 3 ) 


( 
4 
) 


5 


4 


0.6990 


2.7960 


15 


8 


1.1761 


9.4088 


25 


10 


1.3979 


13.9790 


35 


5 


1.5441 


7.7205 


45 


3 


1.6532 


4.9596 


Total 


30 


38.8639 


N 


= Ef = 30 . 


E f. log x 


38.8639 


log ( G.M. ) 


Σ f.log x 

N 


38.8639 

30 


: 1.2955 


G.M. 


Antilog ( 1.2955 ) 


19.74 


Uses of Geometric Mean 


1. Geometric mean is a better average to indicate the rate 
of change. When percentage increases over a period of time are 
given , we must use only Geometric Mean to find out the average 
precentage increase . 
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2. Geometric mean is mostly used in cases where the variation 
in the values over a period of time takes place at a compound 
rate as in the case of investment at compound interest or in the 
growth of population. 


Example 

( 1 ) Where percentage increases are given : 

The population of a country has increased by 15 % from 
1941 to 1951. It has increased by 20 % from 1951 to 1961 , by 
25 % from 1961 to 1971 . Calculate the average increase per 
decade in the population . 

We should assume the population at the beginning of each 
decade as 100 and proceed further : 
1. Population at the beginning 
of the first decade ( 1941–1951 ) 

100 . 
Population at the end of the first 
decade 

115 . 


!! 


= 


11 


100 of 15 


100 


2. Population at the beginning cf the 

second decade ( 1951--61) 
Population at the end of the second 

decade 


120 


100 + 20 


100 


3. Population at the beginning of the 

third decade ( 1961-71 ) 
Population at the end of the third 

decade 


= 125 


100 + 25 


Geometric Mean of the population at the end of each decade 
is obtained by using the following formula : 

Σ log και 
Geonetric Mean 

Anti log 


N 


3 . 


101 


log x , 


2.0607 


log 115 
log 120 
log 125 


log xa 
log Xg 


2.0792 


- 


2.0969 


Total 


I 6.2368 


Σ log και 


6.2368 


= 


n 


3 


2.0789 


Geometric Mean 


Anti log ( 2.0789 ) 


= 119.9 


Rate of increase 


119.9— 100.0 


19.9 


Absolute figure 


Sometimes absolute figures will be given . In such cases 
we should first convert the absolute figures into percentages 
and proceed afterwards as before . 


Year 


Population in 

millions 


1901 


200 


1911 


225 


1921 


260 


1931 


290 


Here absolute figures are given : 
1. If the population of 1901 is taken as 100, the 

population of 1911 will be 


225 

x 100 = 112.5 
200 
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2. If the population of 1911 is taken as 100 . 

the population of 1921 will be 


260 


X 100 


1040 
9 


= 115.6 


225 


3. If the popuation of 1921 is 100 , the 

population of 1931 will be : 


290 

x 100 = 111.5 
260 


The relative values are , 


112-5 , 115.6 and 111.5 


N 


= Number of decades - 3 . 


112.5 


= 2 : 0511 


log ( 1125 ) 
log ( 115.6) 


115.6 


= 2 : 0630 


111.5 


log ( 111.5 ) 


2.0472 
6.1613 


6.1613 


Log (G.M.) 


3 


2.0538 


Anti log ( 2.0538 ) 


113.2 


Increase in population 


113.2 — 100.0 


13.2 % 


We shall use Geometric Mean in cases where the variations 
in the values take place at a compound rate as in the case of 
compound interest or in the growth of population . In such cases , 
the following formula can be adopted. This is nothing new , 
since the students would have studied it in the lower class when 
they studied compound interest. The only difference is that 
we have used logarithms at present: 

PR - P. ( 1 + r )in 
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Where P. = Value at the end of the nth period. 

P. = Value at the beginning of the period. 

= rate of change 
n = the number of years. 


r 


If we know P , and Pn, and n ( since we are interested in the 
average ) we can calculate the average of P. and Po. 


Example 

The population of Tamil Nadu in 1961 and 1971 are given 
below and calculate the average population : 


1961 


337 millions 


1971 


411 millions 


n 


2 ( number of items) 


= 2.5276 


log 337 
log 411 


= 2.6138 


5.1414 


5.1414 


Log (GM ) 


= 2.5707 


2 


.. 


GM 


anti log ( 2 5707 ) 


= 372.1 


Instead of population, we can use the same figures as Rupees 
in lakhs . 


1961 


337 Rs . in lakhs. 


1971 


411 Rs . in lakhs . 


Even then , the methods are the same and the result is also 
the same. 
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Formula used 


Σ Iog και 


1. Ungrouped data = 6 


G = Anti log 


n 


2. Discrete frequency 

distribution 


G = Antilog Ef . log x 

N 


3. Continuous frequency 

distribution 


Anti log ( f . log xi ) 

N 


Where xi stands for the mid - values of the class interval. 


Difference between Arithmetic Mean and Geometric Mean 


Arithmetic Mean 


Geometric Mean 


1. Sum of the values divided The nth root of the product 

by the number of items say of the values is the Geo netric 
n gives the Arithmetic Mean . 
Mean 


2. The average multiplied by The Grometric Mean rais : d co 

n or n times the average the power n (GM ) " gives product 
will give the total value of of all the values (GM ) 
all the items Σx; an x 

where stands for the symbol 
of multiplication . 


77 ( x ;) 


3. All distributions having the All series having equal number of 

equal number of items and items and the same product value 
also having the same total will have the same Geometric 
value will have the same Mean even thc ugh individual 
average even though indivi- value of one distribution does 
dual value of one distribu not agree with the counterpart 
tion does not agree with of the other distribution. 
the counterpart of the 
other . 


4. Even if the value of one If the value of one of the items is 

of the items is 0 , the O , the product of all the values 
Arithmetic Mean can be will be O and consequently , the 
calculated : 

Geometric Mean will be 0 ; 
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Arithmetic Mean 


Geometric Mean 


5. Even if the value of one of If the value of one of the items is 

the items is negative, the negative (-ve ), the product of all 
Arithmetic Mean can be the items will be negative and 
calculated . 

consequently the Geometric Mean 
is imaginary . 


HARMONIC MEAN ( H.M ) 


Though the Arithmetic Mean is based on the given values 
the Harmonic Mean is calculated on the basis of the reciprocals 
of the given values . 


Calculation of harmonic mean for ungrouped values 


Let us consider the values : 4 , 5 6 . 


( 1 ) We must find out the reciprocals of each of the values : 


Value 


Reciprocal 


4 


5 


1/5 


6 


1/6 


Note : The product of a given value and its reciprocal 
will always be 1. 1 is the reciprocal of 4 and 4 is the reciprocal 
of 3 , since their products in both the cccassions are 1 . 


( 2 ) We must find out the total value of the reciprocals. 


1 


1 


1 


74 


37 


+ 


en 
- 


+ 


30 + 24 + 20 

120 


4 


6 


120 


60 
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( 3 ) We must calculate the arithmetic mean of the sum of 
their reciprocals by dividing the total value of the reciprocals 
by the number of items. 


Mean of the reciprocals 


Total of reciprocals 

No. of items 


37 


11 


Х 


1 
3 


60 


37 
60 x 3 


= 37/180 . 


( 4 ) Find out the reciprocal of the Mean of the reciprocals 
which is equal to Harmonic Mean . 


{ 


Mean of the reciprocals = 37/180 . 


1 


Reciprocal of the Mean of the 

reciprocals 


} 


S 


180 
37 


37 


180 


37 


.. 


Harmonic Mean = 4 


32 
37 


Definition of Harmonic Mean 


Now we can define Harmonic Mean . It is the reciprocal 
of the Arithmetic Mean of the reciprocals of the given values . 


Let us examine the formula with an example. 


Find out the Harmonic Mean of the following values : 


20 25 30 35 , 40 . 


Their reciprocals are : 


1 / 20,1-25, 1/30 , 1/35, 1/40. 
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Values 


Reciprocals 


x = 20 


1 ,x , = 1,20 


yg 


25 


1 / X , = 1,25 


Xg 


30 


17x , = 1/30 


x , 


35 


1 /x , = 1/35 


40 


1 / xg = 1/40 


3 


Number of items 


- N = 


5 . 


1 

1 

1 
1 

1 
Sum cf the reciprocals + + 

+ + 
x , x 

* 
1/20 + 1/25 + 1/30 + 1/35 + 1/40 . 


* 


Xg 


Arithmetic Mean of the 


reciprocals 


11 


1/20 + 1/25 + 1 / 30 + 1 / 35 + 1,40 

5 


Reciprocal of the A.M. 


5 
1/20 + 1/25 + 1/30 + 1/35 + 1/40 


n 


1 / X , + 1 / x , + 1 /xg + 1 / x , + 1/4 , 


Harmonic Mean zn / & 1 / * 


5 
0.050 + 0.040 + 0.033 + 0.029 +0.025 


5 / 0.177 


28.25 


Harmonic Mean 


n / 1px 
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Calculation of Harmonic Mean from discrete frequency distri 

bution 


6 ). 


In the previous cases , each value occurs only once . In other 
words , the frequency of the value is one in each case . So the 
reciprocal in the previous example can be interpreted in the light 
of the frequency ie : the reciprocal is the result obtained by divi 
ding the frequency by the given value 

This principle or 
interpretation is applied in the case of discrete frequency distribution . 
Afterwards, the formula would undergo changes as follows : 
Original formula for ungrouped data == η ι Σ fix 

Σf 
Formula for discrete frequency distribution 

Σ fix Σ fix 


n 


1 


Each frequency has to be divided by the respective value 
and this would be same as the reciprocal of the values. 


Example 


Value 


Reciprocals 


Frequency 

( ) 


( a ) 


fix 


10 


10 


10,10 


1.00 


20 


15 


15,20 


= 


0.75 


30 


40 


40/30 


1.33 


40 


10 


10,40 


0.25 


: 


50 


5 


5/50 


0.10 


Total 


80 


3.43 


Mean of the reciprocal = 3.43,80 
Reciprocal of the mean 80 / 3.43 
Harmonic Mean ( or ) H 


- 23 : 3 
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Calculation of Harmonic Mean from continuous frequency 
distribution 


While in the case of discrete frequency distribution , the actual 
values are given , the classes with their lower and upper li rits 
will be given in the case of continuous frequency distribution . 


Therefore, we should first calculate the mid value of each 
of the classes. Afterwards, the procedure will be as before . 


Class 


Frequency 


0 - 10 


5 


10 - 20 


7 


20 – 30 


12 


30 - 40 


4 


40 — 50 


2 


30 


The above table will be replaced by the following table : 


Mid value 


fix 


Frequency 

f 


X 


5 


5 


5/5 


1 


1.00 


15 


7 


0.47 


25 


12 


7/15 
12/25 
4/35 


0.48 


35 


4 


0.11 


45 


2 


2/45 


0.04 


30 


Total 


30 


2.10 
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Σ ft 


N 


Mean 


2.10 / 30 Formula : 


fi 


fi 


Σ 


Σ 


xi 


xi 


Reciprocal = 30 /2.10 Where xi stands for the mid 

value of the class 


= 14.29 


Uses of Harmonic Mean 


In this context it must be clearly understood that the original 
value cf any nilirn and its reciprocal are inversely proportional 
to each other which neans that their product will always be equal 
tc.1 . This indirectly indicates that Harmonic Mean can be applied 
in cases of changes taking place in inverse proportion . This 
can further be explained with the help of an example . 


Let us consider the question of number of workers and their 
wages. These two variables, namely the number of workers and 
wages or the total number of workers and their total wages are 
directly proportional. In other words , if the value of one increases 
the value of other item will also consequently increase or if the 
value of one decreases the value of other will also decrease . 


Let us consider a problem which is very common . The 
distance between two places namely A and B is 24 km . 
A cyclist is travelling from A to B at an average speed of 4 km . 
per hour and returns from B to A at an average speed of 6 km . 
per hour . What is his average speed ? 


Anyone will be tempted at the first sight to say that the average 
speed is 5 km . per hour . 


4 + 6 


ie : 


10/2 


5 km . which is not correct . 


2 


An exercise of slight imagination will explain this. 


The distance between A & B = 24 km . 
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The average speed of the journey from A to B = 4 km . 
per hour . 


Time taken to travel from A to B = 24/4 = 6 hrs . 
The average speed of the journey from B to A = 6 km . per 
hour . 


.. Time taken to travel from B to A = 24/6 = 4 hrs . 


Total distance travelled = 24 t 24 : 48 km . 
Total time taken = 6 to 4 10 hrs . 

Total distance travelled 
:: Average speed 

Total time taken . 
48/10 4.8 km . per hour. 


- 


The difference between the first answer and the subsequent 
answer may be noted . The correctness of the second answer 
needs no special emphasis . The difference is due to the fact that 
the two variables or factors namely speed and time are inversely 
proportional. As the speed increases the time taken will decrease 
or as the speed decreases the time taken will increase . 


The scope for the application of the Geometric Mean or 
Harmonic Mean is very limited . We mostly use only Arithmetic 
Mean . Therefore , we shall concentrate our study more on the 
Arithmetic Mean and its computation . 


List of formulae for Harmonic Mean 


n 


1. For ungrouped data 


H 


2 . 


For discrete frequency distribution H 


1 / x 

Σf 
Σfix 

N 
Σ.fix 


3. For continuous frequency distribution 


fixi 


N 


Σ flx; 


Where xi stands for the mid - value. 
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Relationship between Arithmetic Mean , Geometric Mean and 

Harmonic Mean 


For a given set of values, ( 1 ) Arithmetic Mean will be 
greater than or equal to Geometric Mean ; ( 2 ) Geometric Man 
will be greater than or equal to Harmonic Mean . 


In other words, for a given set of values the Arithmetic Muan 
is greater than or equal to Geometric Mean which in turn is greater 
than or equal to Harmonic Mean . 


Arithmetic Mean > Geometric Mean > Harmonic Mean . 


The students would have already studied this property when 
they studied Progressions in Algebra. 


Weighted Mean 


When we study about the different means , namely Arithmetic 
Mean , Geometric Mean , and Harmonic Mean , we have considered 
three types under each case . 

1. Ungrouped data 
2. Discrete frequency distribution 
3. Continucus frequency distribution 


It can be broadly classified under two kinds . 
1. Ungrouped data (which involves no frequency ) 
2. Frequency distributicn . 


In the case of ungrouped data , each value is considered only 
once . In other words, the frequency of each value is the same or 
uniform . When we say the frequencies of all the values are same, 
it does not mean that the frequency in each case is 1. But the 
frequency may be 1 or any other value . But whatever may be 
the value of frequency, it is uniformly same for all the values . 


Simple Mean : Whenever the Mean ( Arithmetic Mean or Geome 
tric Mean or Harmonic Mean ) is calculated without reference 
to the frequency it is called Simple Mean . 
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Weighted Mean : Whenever the Mean (Arithmetic Mean or 
Geometric Mean cr Harmonic Mean ) is calculated with reference 
to the frequency it is called weighted mean . In the case of Weigh 
ted Mean , the frequency is serving as the weight. In other cases , 
as in the case of average yield of produce in a district, the area 
under the crop in different taluks may serve as the weight. 


However , the si nple mean and the weighted mean are one 
and the same, when the frequencies in all cases are same or 
uniform . Let us examine this : 


Value 


f.x. 


( x ) 
( 1 ) 


Frequency 

( ) 
( 2 ) 


( 3 ) 


10 


5 


50 


30 


5 


150 


: 


40 


5 


200 


50 


5 


250 


130 


20 


650 


Mean 


x.f 
Σf 


11 


Σ fx 
N 


II 


650 
20 


32.5 


If we add all the values given in the col ( 1 ) , we get 130 . 


The No. of items = 4 ! 


i 


130 


:: Average 


E 32.5 


4 


S. II -- 8 
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We find that the means calculated in both the ways ie : with 
frequency and without frequency, are one and the same . 


Exi.fi 
Weighted Mean = 


Efi 


( 10 x 5 ) + ( 30 x 5 ) + ( 40 x 5 ) + ( 50 x 5 ) 

( 5 ) + ( 5 ) + ( 5 ) + ( 5 ) 


This can be simplified by taking 5 outside the bracket in the 
numerator and in the denominator. 


5 ( 10 + 30 + 40 + 50 ) 

5 ( 1 + 1 + 1 + 1 ) 


Cancelling 5 both in the Numerator and Denominator we get, 
10 + 30 + 40 + 50 

Σ x 
130/4 

32.5 
1 + 1 + 1 + 1 


n 


Therefore, what we have studied under discrete and continuous 
frequency distribution under all categories of means are nothing 
but weighted mean , whereas what we have studied under ungrouped 
data is simple mean . 


Comparative merits of different measures of Central tendency 


All Means 


They have well defined formulae . They are easily amenable 
for algebraic treatment. while the Arithmetic Mean can be easily 
computed, the computation of Geometric Mean and Harmonic 
Mean will involve certain difficulties. While the Arithmetic 
Mean can be easily understood, the other two require some imagi 
nation . These measures cannot be compiled from the graph . 
All the measures are expressed in the same unit as the original 
units. They are easily influenced by extreme low or extreme high 
values. They are not affected by position of the values. There 
fore, re - arrangement of the data either in the ascending order 
or descending order is not necessary for computation . Cumulative 
frequencies cannot be used . 
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Median , Quartiles, Quintiles, Deciles and Percentiles 


These have well defined formulae. They have relation to each 
other . 


M 


le 


Do 


P60 


80 ° 


l , 


Pao 


Q : 


P.76 


They can be easily understood . They can also be computed 
from the graph or the cumulative frequency distribution . Cumu 
lative frequencies are used . They are not influenced by the values. 
Therefcre, extreme high or low values have no influence on them . 
They are influenced by the position . Therefore , arrangement of 
data in the order of magnitude is required . They are also expressed 
in the same units as the original values. They have no direct 
relationship to Means . Of course in the case of symmeirical distri 
bution , Median will be equal to Arithmetic Mean . 


Mode 


It has its own formulae for computation . But both of them 
are crude formulae . This is also not affected by the values but 
only influenced by the position of the item . Extreme values have 
no influence. It can be computed from the frequency distribution . 
It is also expressed in the same unit as the original unit. Intro 
duction of new items may alter the position . There may be more 
than one Mode for a same distribution . The distribution may 
be unimodal or bi -modal or trimodal or multi - modal . Arrangement 
of data in the order of magnitude is necessary . It can also be 
computed from graph . It has some direct relationship with the 
Arithmetic Mean and Median . In other words , it can be computed 
from Arithmetic Mean and Median by using any of the two for 
mulae : 


( 1 ) Mean 
1) 


Mode 


= 3 (Mean 


Median ) 


( 2 ) 1 Mode 


3 Median 


2 Mean , 
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Exercises 

( 1 ) Calculate the Mean , Median , Mode and Quartiles for 
the following data . 


fi 


0 - 10 


3 


10 - 20 


5 


21 – 30 


7 . 


31 -- 40 


8 


41 - 50 


2 


25 


( 2 ) Calculate the average for the combination . 


No. of students 


Average weight 


28 


45 kg . 


Group A 
Group B 


42 


43 kg . 


( 3 ) Calculate the Mean and Median . 


X 


f 


10 . 


2 


15 


21 


20 


25 


25 


17 


30 


5 
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(4 ) Calculate the different measures of Central tendencies. 


10 , 12 , 15 , 29 , 40 , 12 , 13 , 15 , 16 , 17 . 


( 5 ) Calculate the G.M. of the following data . 


( i ) 


( ii) 


(iii ) 


10 


120 


1200 


15 


280 


2800 


18 


320 


3200 


20 


420 


4200 


25 


550 


5500 


6 Calculate the G.M. for the following data . 


( i) x 


f 


( ii ) X 


f 


10 


4 


120 


4 


15 


3 


150 


3 


18 


5 


180 


5 


20 


6 


290 


3 


25 


2 


350 


5 


( 7 ) Calculate the H.M. for the following data . 


10 


120 


15 


150 


20 


180 


25 


200 


30 


250 
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( 8 ) Calculate the H.M. for the following data . 


(i ) x 


f 


( ii ) x 


f 


10 


3 


150 


3 . 


15 


4 


180 


4 


18 


3 


200 


2 


20 


2 


225 


3 


25 


1 


250 


3 


( 9) Calculate the 


P , PK for the following data . 


f 


0 - 10 


5 


10 - 20 


8 


9 


20 - 30 
30-40 


12 


40 - 50 


7 


50 - 60 


9 


CHAPTER II 


MEASURES OF DISPERSION 


In the previous chapter, we have studied that the values 
in a series will have a tendency to cluster around certain values 
and those central values are known as measures of locaticn or 
central tendencies. We have also studied that different measures 
will be very helpful for comparing different disiributions. In 
this chapter, we shall study some other measures which are just 
opposite to the measures of location or measures of concentra 
tion . These new measures therefore indicate the variation 
or dispersion . Hence , these new measures are called Measures 
of dispersion. 


Before we proceed further , let us consider the business of 
three merchants dealing in perishable articles like vegetables. 
We shall examine the business from the sales and decide the 
utility of continuing the business, 


Merchant A 
Sales Rs . 


Merchant B 
Sales Rs . 


Merchant C 
Sales Rs . 


40 


35 


40 


75 


50 


40 


25 


45 


40 


Monday 
Tuesday 
Wednesday 
Thursday 
Friday 
Saturday 


80 


40 


40 


50 


30 


40 


30 


40 


40 


--- 


Total sales 300 


240 

240 
-w pouise 99699V6 sdt yd 
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Rs . 50 . 


Average sales per day for A = 300/6 

for B- + 240/6 
for C- > 240/6 


Rs . 40 . 


1 


Rs . 40 . 


The total weekly sales of A is Rs . 300 while that of the 
merchants B and C is Rs . 240. Hence we naturally decide that 
A is getting better business than B and C. We can arrive at 
the very same conclusion by comparing the average daily sales 
instead of comparing the total sales for 6 days . The average 
sale of A comes to Rs. 50 per day while that of B and C is 
only Rs . 40. Hence the business of A is better than B and C. 
A person whc is not prudent enough may come to the above con 
clusion . 


Let us now examine the trend of business an each day . In 
the case of A we find a wide fluctuation ranging from Rs . 25 to . 
Rs . 80 , while in the case of B it varies from Rs . 30 to Rs . 50. But 
in the case of C there is no variation at all. The variation in 
the case of B is not so wide as in the case of A . The gap between 
the highest sales and the lowest sales in the case of A is Rs . 55 
( 80—25 ). While in the case of B , the gap is only Rs . 20 ( 50-30 ). 
which is about 50 % of his average sales. In the case of O ’ it is O . 
Because of this we may naturally comment that the business of B is 
mcre or less more steady than that of A. In the case of C it is the 
most steady . In the case of A , the sales on Tuesday reaches Rs . 75 . 
Expecting the same type of high sales on the next day, if he 
purchases greater quantity for the next day , he will sustain a heavy 
loss since the sales comes to Rs . 25 resulting in heavy stock 
of perishable goods . If he reduces the purchas , on the 
next day because of the poor sales on the previous day, he will 
have a good demand cn the next day . Unless he has sufficient stock , 
he will be losing the customers. Business having such vagaries 
in the sales is really not a good . Though the sales of A is 
better from the point of view of average sales, the sales of C is 
better from the aspect of steadiness . 


: 


From the above observation , we may infer that we should 
not decide the efficiency of the business either by the total sales or 
by the average sales, but we must also take into consideration the 
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variation or change in the day to day sales . As we have said 
that the values have a iendency to centre around a value we can 
also say that the values have a tendency to disperse or deviate or 
vary from the central values . Similar to the measures of central 
tendency we can have measures of variations or Measures of 
dispersion or Measures of deviation . 


There are different measures of dispersion . They are ( 1) 
Range , ( 2 ) Mean Deviation, ( 3 ) Standard Deviation , ( 4 ) Vari 
ance , ( 5 ) Co -efficient of variation and ( 6 ) Semi- Inter quartile 
deviation . 


1. RANGE 


1 


Range is the simplest measure of dispersion . It is defined 
as the difference between the highest and the lowest or the maxi 
mum and the minimum or the largest and the smallest values 
in the series or distribution. It is also expressed in the same 
unit as the original values . 


Example ( 1 ) : Weight of persons in a factory in kg . 


50 , 60 , 52 , 45 , 49 , 35 , 42 , 40 . 


Maximum value 


60 kg . 


Minimum value 


35.kg. 


Range 


60 -_- 35 


25 kg 


Example ( 2 ): We shall consider another series of persons whose 

weights are given in terms of some other units ( say. Ib . ) 


120, 130 , 125 , 160 , 112 , 115 , 140 , 105 . 


Maximum : value 


160 lb. 


Minimum value 


115 lb. 


Range 


160-115 -- 45 lb. 
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In the above two cases, the range is 25 kg . in one series and 
in the other it is 45 lb. It is very difficult to compare the two 
distributions with the help of Ranges, when the values of the 
distributions are expressed in different units , here , kg . and 1b . 
From the absolute values and from the actual numerical values 
without unit of measurement we will say that the second series 
is having the greater difference 45 than the first which is having 
only 25. While in the case of measures of central tendencies, 
the greater the values, the greater the importance. But in the 
case of dispersion, the lesser the value the greater the importance. 
From this angle, we may say that series 1 is better which may not 
be correct . 


For easy comparison, the units of the values of different 
distribution should be the same. This can be seen from the 
following situation 


Suppose we want to study the income of industrial workers 
in different countries like Great Britain , Germany , Japan , Russia , 
United States and India . The income of the workers in these 
countries will be expressed in terms of their currency . 

Let us 
say that the average income and range in their income are as 
follows 


Country 


Average monthly income Range in the income 


1. Great Britain 


300 pounds 


75 pounds 


2. Germany 


500 Mark 


45 Mark 


3. Japan 


450 Yen 


50 Yen 


4. Russia 


350 Roubles 


35 Roubles 


5. United States 


700 Dollars 


100 Dollars 


6. India 


400 Rupees 


80 Rupees 
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The comparison is difficult. The difficulty is due to 
the different units. For the sake of comparisons we cannot 
dc away with the units . Therefore , we have to think of an alter 
native method for comparision where the units do not play a part. 
We should have relative measures . 


Co - efficient of Dispersion ( or ) Co - efficient of Range 


A relative measure is not expressed in any unit . It is free 
from units of measurements . It is only a mere number. The 
relative measure is called Co - efficient of dispersion. It is defined as 
follows: 


Co - efficient of dispersion 


Difference between the largest and 
the smallest values 
Sum of the largest and the smallest 
values. 


If the largest value is denoted by L and the smallest value 
by S we can have the formula for defining Co - efficient of disper 
sion as follows: 


L - S 


Co -efficient of dispersion 


LES 


We shall calculate the co - efficient of dispersion for the two 
examples considered earlier . 


Example I 


Example II 


Largest value 


60 


160 


Smallest value 


35 


115 


Sum 


95 


275 


Difference 25 


45 
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Co - efficient of dispersion 


L - S 


25 


(Example 1 ) 


IL 


60 - 35 
60 + 35 


11 


L + S 


95 


0.26 


Co -efficient of dispersion 

160 -- 115 
( Example II) 

160 + 115 


45 


= 


0.16 . 


275 


We find that the second example has a lesser value of co -effi 
cient of dispersion . Hence the second is better . 


( 1 ) Calculate the co - efficient of dispersion . 

30 , 45 , 50 , 70 , 75 . 


Largest value Ľ 


75 . 


Smallest value S = 30 . 


45 


Co -efficient of dispersion 


LS 
LES 


75 - 30 
75 + 30 


105 


0.43 


( 2) Calculate the Range and the co - efficient of dispersion in 
the following distribution : 


No. of students 


Weight 
( lb. ) 


40 - 50 


4 


50 - 60 


2 


60 - 70 


5 


70-80 


2 


T 


80 – 90 


1 


+ 
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Highest value 


= L = 90 . 


Smallest valu 


--- 


S = 40 . 


:: Range 


L- S = 90 - 40 = 50 lb. 


L - S 


Co -efficient of Range 


= 


LIS 


ll 


90 - 40 
90 + 40 


50 


= 


150 


130 


0.38 


Merits and Demerits of Range 


Range is easy for calculation and understanding. But it has 
its own defects. Range gives the difference between the highest 
and the lowest values of the variable . Therefore , we consider 
only two values namely the highest value and the lowest value 
and from these two values we come to a conclusion about the 
dependability of the distribution . But the distribution may have 
a number of values or a number of units and expressing an opinic n 
about all the members in the distribution based on the obserya 
tion of only two members may not be a sound proposition. What 
ever we say it should be based on the observations of the values of 
all the members in the group and not on the basis of only two 
members. Of course it may be a quick process but it has its own 
defects. 


Defects 


1 . 


2 , 


There are seldom highest and lowest values . 
The occurence of one of the values either highest or 
lowest has considerable effect on the value of the Range . 
It is not representative of all values in the series. 


3 . 


Therefore , we have to think of some other alternative measures 
of deviation which will throw light on the deviatinos of all the 
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values or the values of all the members in the group or it should 
give an overall picture of all the members or it should be a true 
representative of all the members in the group . 


2. QUARTILE DEVIATION --- R 


In the previous chapter dealing with measurement of central 
tendency we have studied two quartiles nan , ely lo upper 
quartile and Q , the lower quartile. Q , is that value of the vari 
able which divides the distribution into two parts such that 25 % 
of the cotal number of units will have value greater than 
l , and the remaining 75 % of the units will have value less than 
lg . , is just opposite to this . l , is that value of the variable 
which divides the population into two parts such that 25 % of 
the total number of members will have value less than Q , and the 
remaining 75 % of the members will have value greater than Q. 
It can be further explained. In the case of Qg , 25 % of the members 
will have value greater than Q , and in the case of Q1 , 25 % of 
the members will have values less than 9 , Therefore the 
range between Q- Q , will contain 50 % of the population 
units. This range is called the Interquartile Range . If this range 
is divided by.2 , we will have quartile deviation , l , which is other 
wise called as Semi- Interquartile Range. 


Q 


28 


25 % 


50 % 


25 % 


Hence we can have the Semi-Inter Quartile Range as a measure 
of deviation and it is denoted by the letter Q . 


lg – Q 


e 


2 


i 


Q is also expressed in the same units as the original values of 
the distribution . Hence the comparison of the two or more 
distributions, each in different units of measures is difficult . 
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Hence , we will consider here the computation of co - efficient of 
Quartile dispersion free from the values of the quartiles themselves. 


Calculate ( 1 ) 

( 2 ) 


Quartile Deviation . 
Co - efficient of Quartile dispersion . 


Q = 45 kg. Qg = 75 kg . 

Q- Q 
1. Q uartile Deviation = 

2 


75 - 45 

2 


30 
2 . 


15 kg . 


- 


2. Co - efficient of 

quartile dispersion 


Qg - Q 
lg +9 
75 – 45 
75 + 45 


30 


= 0.25 
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Co -efficient of quartile dispersion is free from the units of 
measurements . It is expressed only in terms of numerical values . 
Hence easy for comparison . The computation is easy when the 
values of quartiles are given . But the computation is difficult if 
we have to calculate it after computing the values of quartiles 
from the original values. This is calculated with reference to the 
position of the items and not with reference to the values of the 
items. 


3. MEAN DEVIATION (M.D ) 


When we studied the Ranges , we have seen that Range 
is based on only two values ie : the maximum and the minimum 
values. We have also seen that Quartile deviation is influenced 
by the positions of the items and not by the values of items . Now 
we shall see measures which are influenced by the values of the 
items and that too by the values of all the items in the series or 
distribution . One such measure is Mean deviation ( M.D. ) or 
Average Deviation. 
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Mean Deviation (M.D.) 


For a given set of data we should calculate the average or 
Arithmetic Mean F. We should then subtract each value from 
the Mean and find out the difference ( d ) or deviation of each value 
from the Mean . Next we should find the sum of the total devi 
ations of all values from Mean d . The sum of total devia 

Σ d 
tions should be divided by the number of items n . 


N 


This is called Average or Mean deviation per item . 


Let us consider the following example : 


Sales 


Sales 


Shop I 


Shop II 

Rs . 


Rs. 


Monday 


40 


35 


Tuesday 


75 


50 


Wednesday 


25 


45 


Thursday 


80 


40 


Friday 


50 


30 


Saturday 


30 


40 


300 


240 


Average 


300/6 


= Rs . 50 / 


240/6 


Rs.40 / 


The daily sales in the case of two shops are given above . We 
find that the average sales in the shops is Rs . 50 /- and Rs . 40 /- per 
day. Let us now calculate the difference of the sales of each 
day from the average sales. 
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Shop I 


Shop II 


Monday 


40 50 = = 10 


35 -- 40 = -5 


Tuesday 


75 - 50 = + 25 


50 - 40 


10 


Wednesday 


25 - 50 


- 25 . 


45 — 40 = + 5 


Thursday 


80 — 50 = 


+.30 


40. — 40 


0 


Friday 


50 - 50 


0 


30 – 40 = - 10 


Saturday 


30 - 50 


- 20 


40 - 40 


0 


1 


Total Difference 


0 


0 


Average difference 


= 0/6 . 


Average difference = 0/6 

= 0 . 


= 0 . 


We find that the total difference is 0 in both the cases . Conse 
quently, the 

average difference will also be O . Not 
only in these two cases , but also in all cases , total sum of deviations 
will be equal to 0 , if the deviations are calculated from the Arith 
metic Mean. This is one of the important properties of the 
Arithmetic Mean , which we have studied in the previous chapter. 
Some of the deviations are positive since the values are greater 
than the Mean and the rest of the differences are negative since 
the values are smaller than the Mean . The sum of all positive 
deviations will be equal to the sum of all negative. deviations and 
they get cancelled mutually. Thus the total of deviations is equal 
to 0 . 


In order to have effective comparison we should overcome 
the above difficulty. The total deviation comes to O because 
of the occurrence of positive and negative values of deviations. 
We can ignc re the signs of the deviations. In other words, we 

S. II - 9 
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should consider only the numerical values without their signs 
or we should consider all the values as positive figures . Consi 
deration of only the value is known as consideration of only 
absolute values. The sum of the absolute values divided by the 
number of units will give the average deviation which is called 
Mean Deviation . 


Absolute Deviation in the previous examples 


Days 


Case ( 1 ) 


Case ( II ) 


Monday 


10 


5 


Tuesday 


25 


10 


! 


Wednesday 


25 


5 


Thursday 


30 


0 


r 


Friday 


0 


10 . 


Saturday 


20 


0 


Total 


110 


30 


110 


Mean deviation 


18 1/3 . 


30/6 = 5. 


6 


The Mean Deviation of I is greater than the M.D. of II. 
Hence, II is more dependable . The lesser the value of the Mean 
Deviation the greater the dependability. 
Mean Deviation 

Σ | Total deviation 


n 


The two lines on either side indicate that the figures are only 
absolute values without any consideration of the sign . 

= 
Mean Deviation 

n 
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In case we get frequencies the formula can be suitably modi 
fied for the multiplication of the difference by the respective 
frequency 

E ( X; -- * ). 

fi 
N 

where N = Efi 


Defects 

Jgnoring the sign of the deviation may not be a good propoº 
sition since we make the differences as artificial. 


The peculiar situation of getting the total deviation as 0 , 
or taking only the absclute deviation thereby considering only 
the artificial differences can be avoided if we calculate the difference 
of the values from any other value instead of from Arithmetic 
Mean . . We can calculate the difference from the Monday value 
or Tuesday value . But each person will select different values for 
comparison and therefore there may not be any uniformity in the 
choice of the base . Unless there is uniformity in the choice of the 
base among the examiners , we cannot have effective comparison 
and the comparison will then be meaningless. 

We can calculate the Mean deviation not only from Mean 
but also from Median or Mode . But calculation of Mean devi 
ation from Median or Mode will involve considerable computation 
since the calculation of Median and Mode themselves are rather 
more cumbersome than the calculation of Mean . Therefore , 
calculation of Mean deviation from Mean is more easy and hence 
widely used . 


Mean co - efficient of dispersion 

As we have calculated relative measures of dispersions from 
Range and Quartile deviation , we can also calculate relative measure 
of dispersion from Mean Deviation also . This is called Mean 
Co - efficient of dispersion . This can be calculated as follows : 


Mean co - efficient of dispersion 


Mean Deviation from Mean 

Mean 
Σ 

| * 


& 


II 


si 
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Mean co - efficient of dispersion will be expressed in mere 
number -- without any unit of measurement. Therefore , it helps 
for easy comparison . In the previous example , Mean and 
Mean Deviation are as follows: 


Case I 


Case II 


Mean 


50 


40 


Mean Deviation 


110/6 


30/6 


.. 


110 


Mean co - efficient of dispersion 


30 1 

Х 
6 - 40 


3 


6 x 50 


11/30 


1/8 . 


4. VARIANCE ( V ) = .02 


We have seen in the calculation of Mean Deviation that 
uniform base has to be adopted. If not , each one would calculate 
the Mean Deviation from different base either from Mean , or from 
Median or from Mode or from any other arbitrary value as one 
desires . For the sake of uniformity the Mean deviation is 
calculated from Mean . 


We have seen that when deviation of each value is calculated 
from the Mean , the total sum of deviations is equal to 0. In 
order to overcome this situation, we have suggested that absolute 
values of the deviations ie : the numerical values of the deviations 
without the sign can be adopted . But it does not seem to be 
quite convincing. It appears to be an artificial way of overcoming 
the difficulty . 


. 


We can overcome the difficulty of the sign of the deviations 
by another way instead of ignoring them . In this process we can 
square the deviation . The square of the deviation will always be 
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positive irrespective of the fact whether the sign of the deviation 
is positive or negative . Since the square of a negative quantity 
will always be a positive quantity. We can find the total sum of 
squares of deviations. The total sum of squares of deviations 
can be divided by the number of items ( n ) and the average can 
be calculated . This is called Average Square of Deviations or 
Mean Square Deviation . It is known as Variance ( v) in 
statistics. We can examine this with the help of the example 
given below : 


Sales 
Rs . 


Case I 
( deviation ) 
d = x — JE 


Square of the 

deviation 


d2 


- ( x - 72 


1. Monday 


40 . 


40 - 50 = -10 


100 


2. Tuesday 


75 


75 - 50 


25 


625 


3. Wednesday 


25 


25 – 50 = 


-25 


625 


4. Thursday 


80 


80 - 50 = +30 


900 


5. Friday 


50 


50 – 50 = 


0 


6. Saturday 


30 


30 - 50 = -20 


400 


Total 


300 


2650 


Mean = 


300/6 


Rs . 50 / 


2650 


Mean Square Deviation ( v ) 


441.7 per day. 


6 
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Let us examine the Case II : 


Sales 


Deviation 


d = * - ) 2 


Rs. 


d 


X 


1 


1. Monday 


35 


5 


25 


2. Tuesday 


50 


10 


100 


3. Wednesday 


45 


5 


25 


4. Thursday 


40 


0 


5. Friday 


- 10 


100 


30 
40 


6. Saturday 


0 


- 


Total 


240 


250 


250 


Mean 


= 240/6 = 40. Mean Square Deviation 


2 
.41 . 

3 


6 


per day. 


We find that the value of Mean Square Deviation or Variance 
in the second case is smaller than that in the first case . Conse 
quently , we can say that the sales in the second case are more 
steady and reliable . 


5. STANDARD DEVIATION 


O ( Sigma) 


Standard Deviation is another measure of dispersion called 
sigma denoted by the Greek letter ( small) o . This is the most 
important measure or parameter in statistics and is widely used 
in all statistical applications. Hence greater care on the part of 
the students is required in the study and computation of standard 
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deviation . It can be rightly said that the science of statistics is 
revolving with Arithmetic Mean as the centre and standard devia 
tion as the radius . 


1 


For calculating the variance ( v ) or Mean Square Deviation 
(M.S.D.) we have followed the following procedurs: 


( 1 ) 


We have found out the sum of the values of the given 
item by adding all the values Σx. 


( 2 ) We have calculated the Arithmetic Mean of the given 
values by dividing the total value by the number of 

Σ και 
items: 


N 


( 3 ) We have subtracted the Arithmetic Mean from each 

of the given values and found the difference or deviation 
for each value. ( x – F ) d . 


( 4 ) Then we have squared each of the deviations (da ). 


} 


( 5 ) We have found out the sum of squares of all deviations 

by totalling them da = (x - **) 
( 6 ) The sum of the squares cf deviations thus arrived at 
was divided by the number of items to find out the Mean 

Σ d2 
Square of Deviation . 

£ ( x – )? 

N 


n 


( 7 ) This Mean Square deviation is nothing but the vari 


ance . 


( 8 ) The Square Root of the Mean Square Deviation is 

called Standard Deviation denoted by the letter o . 


o = Vi 


( x — # ) 


n 


In order to overcome the difficulty encountered due to the 
sign in the case of individual deviation we have squared them 
and finally arrived at this, since what we want is the average devi 
ation and not average square deviation . Hence we have to find 
out the square root of the variance since we have originally squared 
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the individual deviation . There will be 2 values for the square root, 
one positive and another negative . We would take the positive 
value . 


We shall examine the same examples considered previously. 


Day 


Sales 
in Rs . 


Deviation from 

Mean 


Square of the 

deviation 
( x - # 


* 


81 


1. Monday 


40 


10 


100 


2. Tuesday 


75 


25 


625 


3. Wednesday 


25 : 


- 25 


625 


4. Thursday 


80 


30 


900 


5. Friday 


50 


0 


6. Saturday 


30 


30 


--- 20 


400 


. 


Total 


300 


2650 


Mean 


300/6 


= Rs. 50 per day. 


Sum of square of deviation 


2650 ; n = 6 . 


Mean Square Deviation 


2650/6 


: 


= 441.7 


Standard Deviation 


= V 441.7 


= 21 rupees per day . 


It should be noted that the Standard Deviation will always be 
expressed in the same unit as the original items are expressed : 
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Root Mean Square Deviation 


The Standard Deviation is otherwise known as Root Mean 
Square Deviation . If we read this name from right to left (Devia 
tion , Square , Mean , Root) instead from left to right, it would 
indicate the processes involved in the computation of Standard 
Deviation . 


value 


1. Deviation :: Find out the deviation of each 

from the Mean . d 


2. Square 


:: Square each deviation . d2 


3. Mean 


:: Find the Mean Square of the deviation . dºln 


4. Root 


Find out the Square root of the Mean 

d2 
Square of the deviation , 


V 


n 


The lesser the value of the Standard deviation the greater 
the reliability of the values of the distribution . In other words, 
it will indicate that each value does not vary or differ very much 
from the other . If the standard deviation is 0 , it will indicate 
all the values are same.. 


In the same way , we can calculate Root Mean Square Devia 
tion from any value other than Mean . But the Mean 
Square deviation calculated from the Mean is taken as a standard . 
Hence it is known as Standard Deviation . 


Co - efficient of Variation ( C.V.) 

Before we study more about Standard Deviation we should 
study another measurement called the co - efficient of variation . 


We have seen that the standard deviation is always expressed 
in the same unit of measures as the original item . Therefore , 
different distributions having values in different units of measure 
ment will have standard deviations also in different units of measu 
rement. In such situation the comparision will not be effective 
or rather possible . If we want to have effective comparison , 


138 


with the help of standard deviation , we should get rid of the unit 
of measurement. This is not possible . We can overcome this 
by finding out a relative measurement free from units of measure 
ment. For this purpose , the standard deviation can be divided 


o 


by the Arithmetic Mean 


X 


Standard deviation 
Arithmetic Mean 


This ratio 


will be free from units of 


measurement and will be a mere number. The ratio will be too 
small. Hence it is multiplied by 100 . 


om 


x 100 . 


In this process, the standard deviation is expressed as a percen 
tage of the Mean . This value is known as co - efficient of variation 
and represented by the letter C.V . 


C.V 


x 100 


In the above example , the Arithmetic Mean is Rs . 50 / - and 
the standard deviation is Rs. 21 . 


21 


:: Co - efficient of variation 


x 100 


50 


42 (mere number) 


1 


Advantages in calculating the Mean Square Deviation from Arith 

metic Mean 


One great advantage in calculating the variance and standard 
deviation from the Mean is that they will be minimum in their 
values. If we calculate the variance or deviation from any cther 
value, which is either greater or less than the Mean , the vari 
ance will be greater in value than the one computed from the 
Mean . This gives another important property of the Arithmetic 
Mean namely that the sum of the squares of diviations 
taken from the Arithmetic Mean will always be the minimum . 
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The properties of the Arithmetic Mean can be summarised as 
follows: 


1. The sum of the deviations taken from the Mean will , 

always be 0 . 


2. The sum of the Squares of deviations taken from the 

mean will always be the minimum , 


Whatever apply to the sum of the deviations or sum of 
the squares of deviations will apply to the average deviation or 
Mean deviation or Mean Square deviation . We have already 
verified the first property . Now we shall verify the second pro 
perty. 


We shall calculate the Variance from another arbitrary value 
say A instead of the Arithmetic Mean . 


In this context it is better to refresh our mind. We know 
that ( a + b ) 2 a + 2ab + b ?. This formula is now applied 
here . The deviaticn of each value from one arbitrary value 
A can be written as the sum of deviation of that value from the 
Mean and the difference between the mean and the arbitrary value 


1 


say A. 


X A 


Deviation of the value from the arbitrary 
value . 


* — F 


Deviation of the value from Mean . 


F -- A 


Difference between the Mean and the Arbitrary 
value . 


x - A 


= X J + FA , 


( x - 7 ) + ( x - A ) 


( x - A ) 


{ (x- ) + ( F - A ) } ? 
= ( x - )2 + 2 ( x - 7 ) ( -A ) + ( -A ) 


This is the expansion we get for one value. But there are 
many items and we have to calculate the square of the deviation 
for each and every item . The sum of squares of deviaticns can 
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be calculated by adding the squares of differences of all the 
items. This can be written as follows: 


E x 


( x - A ) 

( 1 ) 


( x - 5 ) 

F ? 
( 2 ) 


+ 2 ( x – 5 ) ( F - A ) 

( 3 ) 
+ & ( # - A ) 

( 4 ) 


-- 


( 1 ) 


Sum of the squares of the difference between the values 
and the arbitrary value . 


( 2 ) Sum of the squares of the deviations of the values from 

the Mean . 


( 3 ) Twice the sum of the products of the difference 

( x # ) and ( I A ) . 
( 4 ) Sum of the squares of differences of the Mean and the 

arbitrary value A. 


Let us take the term ( 3 ). 
E 2 ( x 

*) ( ū - A ) 


- 


2 is a constant number and hence can be taken outside the 
symbol. 


F is a constant as far as a particular distribution is concerned 
A is also constant number . 


Hence ( 
taken outside 


A ) is a constant number and it can also be 


I 
W 


So we can take out 2 ( Z - A ) outside the sigma symbol. 

2 ( F – A ) & ( x -- * ) 


+ 


But 3 ( x – J ) = the sum of the deviations taken from the 
Arithmetic Mean , which is equal to O . 


Hence & ( x - A ) ( x x ) + 0 + (x – A ) ? 

. ( x - A ) = ( x 5 )2 + N. ( F - A ) 
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Sum of the squares Sum of the Squares + Sum of the squares 
of deviations from A of deviations taken of the differences 
from Mean 

between Mean and A. 


g ( x 


Let us divide all the terms by N . 
A ) 

X2 

( x - A ) 

+ 
N N 

N 
Variance about A Variance about + Square of the difference 

between # and A. 
Variance about A = Variance + Square of the difference bet 

ween Ã and A. 
( * A ) 
Since 

NF - A ) 

( i - A ) 
N 

N 


2 


? 


Let us denote the various terms as follows : 
Variance about any arbitrary value ( A ) 


= S2 


Variance about Mean ? 


Difference between A and w 


= d 


. : S2 


V + d2 or v = S2 - da 
S ? d 

d2 


ie : 2 


S2 is minimum when d = 0 ie when A = h . The minimum 
value of sa 


02 . 


So it is seen from the above that V , ie : the Mean Square 
Deviation taken from Arithmetic Mean will always be the mini 
mum . When the Mean Square of the deviation is minimum , 
the sum of the square of the deviation will also be the minimum 
if it is calculated from the Arithmetic Mean . It may be noted 
that d = ( - A ) will be positive when A is less than 2. d = ( - A ) 
will be negative when A is greater than F. However, da will be 
always positive since the square of any number , either positive 
or negative, will always be a positive. 


Advantages of Standard Deviation 

It is based on all observations . It can be easily calculated , 
easily understood . It is amenable to algebraic treatment. 
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I. Direct Method 


E 


V 


(x - 3 ) 

N 


2. 


X 


Deviaticn . 


In order to find out the variance, we have to calculate the 
average J. Then the deviation of each of the values from the 
Mean has to be squared ( x - 7 ° ). Then we have to add the squares 
of each deviation and find out the sum of squares of the deviation . 
( x - ) . 


This sum of squares of the deviations has to be divided by 
‘ N ’ to get the Mean Square Deviation . 


The various processes involved can be avoided and we can 
calculate the V from the original values themselves. 


We know that 


3 ) 


V 


( x 

N 


2ab + 6 


We know that ( a - b )2 a2 
. : . ( x - 5) 2 = x2 – 2 x 3 + 2 


} 


When we take the sum of squares of deviations 
* ) 

2 x J to 7 
( x - 3 ) Σ x2 2 & Σx + Σ 2 


& ( X 


( Since 2 is a constant number and F is a constant value as 
far as a particular distrbution is concerned , 2 m can be taken 
outside , the sigma · symbol). 


- E x2 


2 J Ex + NF 


But Ex 


N 


Σ x2 2N 32 + N 72 ( since 
x2 - N ? 


Σx 

= ... & x = N3 
N 


HI. 
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Let us divide it by N to get V , 


Σ x2 


( x - x )" 

N 


ī 


N 2 


N 


N 


x2 


J2 


N 


Mean square of the original value 


Square of the Mean . 


This can be further simplified . 


Σ 


9 


( 1 ) 


11 


Ο Σx 
IN 


( Since = 9x / N ) 


N 


Σχ2 


( 
2 
) 


& x ) 2 
N2 


N 


1 / N { 2x _ (**) 


The students should clearly notice the difference between 
£ x and ( 2x ) 

Σx2 = Sum of the squares of each value . 
( 2x ) Square of the sum of all the values . 


Suppose there are two items ; 3 and 4 


32 +42 


- 


9 + 16 


= 25 . 


( 3 x )2 = ( 3 + 4 ) 


= 72 = 49 . 


Any one of the three formuale can be followed . 
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Example Find the standard deviation of the 

series : 


following 


Weight of bags ( kg ) 


x2 


67 


4489 


75 


5625 


80 


6400 


83 


6889 


85 


1225 


93 


8649 


97 


9409 


91 


8281 


98 


9604 


769 


66571 


2 


(1) Σ 


x " 
N 


? 


66571 

9 


(85.44 ) 


= 7396.78 – 7299.99 


= 96.79 


V 96.79 


9.8 


1.45 


2nd Method : 


Σ x2 
N 


7396.78 


( 769/92 


! 


= 7396.78 


591361 

81 


7396.78 – 7300.75 


96.03 


- 9.8 


3rd Method : 


£ x2 - 


(Ex ) 
N 


} 


= 


769 x 769 


1 / N { 
1,9 { 66571 

{ 


} 


9 


1/9 


66571 


591361 

9 


1/9 ( 66571 


65707 ) 


864 


1/9 X 864 


96 . 


9 


✓ 96 


9.8 


All the three formulae are one and the same . The first is 
the simplest and easy for remembrance . The second and third 
are nothing but modifications of the first formula. Therefore, 
we shall foliow the first in our further calculations . 


1 


In the above example, we have worked out the variance and the 
standard deviation from ungrouped data by shortcut method . 
The formula used is : 

Σ x3 
V 

2 

N 
S. II - 10 


4 
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] 


o 


Σ x2 
N 


* = v 


This is the formula for the direct method . 


Example 2 : Calculate variance and Standard Deviation for the 

values . 


i 


30 , 80 , 60, 70 , 20 , 40 , 50 . 


Values 


x 


X 


72 


( Rs .) 


30 


900 


80 


6400 


60 


3600 


70 


4900 


20 


400 


40 


1600 


50 


2500 


1 


350 


20300 


n 


7 . 


350 . 


ii 


350 
7 


50 . 


2 


= 50 x 50 = 2500 


Ex² 


20300 
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Variance 


n 


E x² 


* 

--* ? 
20300 

7 
20300 


W 


n 


V = Variance = 


2500 


7 


400 


0 : Standard Deviation 


1400 


Rs . 20 per item . 


Shortcut Method 


We shall adopt shortcut method for calculation of standard 
diviation . In this shortcut method we calculate deviation from 
an assumed mean . The following steps are adopted . 


. 


1. Let us first assume an arbitrary value, say 60 in this case 


as A. 


= X 


2. Calculate the deviation of each value from this assumed 
niean and let it be denoted by the letters ( d ) 

A. 
3. Square each such deviation : d2 ( x - A ) 
4. Find the sum of squares of deviation 
5. Then calculate the standard deviation of d with the 

help of the following formula 


Ed ? 


(x - A ) 


( x - A ) 


02 


n 


2 


da 


d .. 


in 


N 


d2 


2 


02 


d 2 


n 


when A = 0 the formula :: will become 

Σx2 
02 

Σ x 


2 


N 


(** ) 
V * - ( * ) 


0 . 
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X – A = d 


d2 


30 , 


-30 


900 


80 


20 


400 


A 60 


0 


0 


70 


10 


100 


20 


-40 


1600 


> 


40 


- 20 


- 400 


50 


- 10 


100 


2d - 70 


3500 


5 


à 


-7° 


70 
7 


10 . 


Y 


d ? 


10. X – 10 = 100 . 


..! 


Standard Deviation 


Σ d2 


d2 


n 


3500 
7 


100 


t 


= 


1500 - 100 


11 


✓ 400 


= Rs. 20 për head . 


The standard deviaton of ‘d is the same as the standard 
deviation of x . 


1 


149 


Example 3 

Calculate the standard deviation by the shortcut method : 


Weight of Bags 


d = x A 

: 


d ? 


X 


67 


-16 


256 


75 


- 8 


64 


80 


- 3 


9 


85 


2 


4 . 


83 - A 


0 


0 


93 


10 


100 
: 4 ; 


97 


14 


196 


91 


8 


64 


98 


15 


225 


_ 


22 


918 


. 


Let us assume 83 as A. and calculate the deviation of each 
value from 83 . 


- 


2219 


n = 9 ; $ d 

22 ; d 
d2 = 918 . 


= 2.44 d2 

= 5.95 


Standard Deviation of d 


Σ d2 


d 


11 


n 


V 


918 
9 


5.95 


196.05. 


= 9.8 
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This is the value that we have obtained in our previous example 
No. ( 1 ) under Direct Method . 
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II. Calculation of Standard Deviation from Discrete frequency 

distribution 


A. Direct Method 


We can calculate the standard deviation by the direct method 
as follows : 


1. We should first find out the Arithmetic Mean of the 
distribution by multiplying each value by its respective frequency 
and dividing the total of the products by the sum of the total 
frequencies . The following formula can be used : 


Σfx 


2. Deviation of each value from the Mean can be calcu 
lated ( x - 2) . 


3. Square of each deviation should be calculated (x − 3 )2 
4. Square of each deviation should be multiplied by the 

respective frequency ( x - 2 ) f . 
5. Sum of the squares of deviation multiplied by the frequ 

encies should be calculated by adding a ( x * ) f. 
6. The average of the squares of deviation should be calcu 

lated by dividing the sum of squares of deviation by the 
sum of frequencies. 


( x - 3 ) f 

Σf 


This is called Mean Square Deviation or Variance. 
7. We should calculate the Square root for this Mean Square 

Deviation and this is the standard deviation . 


151 


The method is more or less the same as the method adopted 
for ungrouped data . The only difference is that we have used 
the frequency in two stages as follows: 
( 1 ) We have multiplied the value by the frequency for 

calculating the Mean ; 
( 2 ) We have multiplied the square of the deviation by the 

frequency 


Example 4 : Calculate the Standard Deviation for the following 

distribution : 


Value 


Deviation 


Frequency 

f 


( x ) 


xf 


30 


3 


90 


80 


5 


400 


60 


6 


360 


70 


10 


700 


1 


20 


3 


60 


40 


2 


80 


50 


1 


50 


30 


1740 


N = 


Ef = 30 . 

1 = 


Ef.x = 


1740 . 


1740 
30 


& 


= 58 


.152 


11 !! 


fi vix ). (X = 7 )? ( 

xx) 

11:13 
30 * 0.ga ! 
za voi 30 58 

58.14 - 28 784 2352 


80 , , 5 ; 1-80-58 
; 1.80.58 : 22 

22.73 4841,7 


2420 


60 


6 


60 -- 58 


2 


4 


24 


70 


10 


70 -- 58 = 


12 :144 


Liit ) . 
1440 


. 


1444 


4332 


--- 20 . 

--3 
12. 1!Hiv 
40 2 


20--58-5-38 

1 " } 
40 --- 58 F - 18 


324 


648 


50 


1 


50 --- 58 = 28 


64 


64 


30 


11280 


Der 


V = 2 (x 7 ) .F 


11280 


= 376 


Σf 


30 . 


N376 


= Rs . 19.4 per head , 


1 


B. Shortcut Method 


. 


We can calculate the Standard Deviation by shortcut method. 
We can use an assumed Mean and find out the standard deviation . 
The various processes are the same as before. 


Let us take 50 - as..the .. assumed Mean , denoted by A ... Each 
deviation will be x — A and its square will be ( x A ) . The 
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process is given below : 


X 


f ( x --- 50 ) 

d 


( * —- 50 ) f ( x – 50 ) f ( x - 50) 
fd 

d2 


f. da 


30 3 


--- 20 


--60 


400 


1200 


80 . 


5 


30 


150 


900 


4500 


60 


6 


10 


60 


100 


600 


... 70 


10 . 


20 


200 


400 


4000 


. 


20 


3 


---- 30 


490 


900 


2700 


40 2 


-10 


20 


100 


200 


50 


1 


0 


0 


0 


0 


240 


13200 


N 


11 


Ef = 30 . 


fd 


240 


240 


d 


S 8 
. 


d ? ..64 . 


30 


af d = 13200 


Mean Square Deviation 


13200 

30 


64 . 


440 


64 


- 376 . 


Standard Deviation 


1376 


Rs . 19.4 per head . 


} 


In this process we find that the Standard Deviation of the 
original value and the Standard Deviation of the new variable 
obtained by substituting x — A are one and the same. In his 
process , the difficulty involved in squaring the deviation is greatly 
reduced . 
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III . Calculation of Standard Deviation from continuous frequency 

distribution 


A. Direct Method 


The various steps involved are as follows : 


( 1 ) Mid value of each class should be calculated and after 

wards the methods are the same as in the case of discrete 
frequency distribution . 


( 2 ) The mid - value has to be multiplied by the respective 

frequency and fromth is total, the Arithmetic Mean 
has to be calculated by using the formula 


xifi 


MI 


where x ; indicates the mid - value. 


( 3 ) The deviation of each value has to be calculated from 

the Mean x = = d . 


( 4 ) The deviation obtained has to be squared ( x 


( 5 ) Each square of the deviation has to be multiplied by 

the respective frequency (x – * ) f. 


( 6 ) From the total square of the deviation , we have to 

find out the Mean Square Deviation by dividing it 
by Ef = N. 


( x - 3 ) f 

Σf 


(7). The square root of the Mean Square deviation will 

be the Root Mean Square deviation : 


( x - 2 ) f 


. 
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Example 5 : Calculate the standard deviation for the following 
frequency distribution . 


Classes 
Weight in lb. 

( x ) 


Frequency 
No. of boys 

(f ). 


60.5 — 70.5 


1 


70.5 80.5 


5 


80.5 — 90.5 


9 


90.5 -100.5 


14 


100.5 110.5 


15 


110.5.— 120.5 


4 


120.5 130.5 


2 


50 


Mid values 


f 


f. x 


1 


65.5 


1 


65.5 


75.5 


· 5 


377.5 


85.5 


9 


769,5 


95.5 


14 


1337,0 


105.5 


15 


1582.5 


115.5 


4 


462.0 


125.5 


2 . 


251.0 


50 


4845.0 


Ex . f 


Mean 


4845 
50 


96.9 
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f (x 3 ) 


XÃ 


ILL... 


---- 31.4 ... 


1 


985,96 


985.96 


7 


- 21.4 


5 


457,96 


2289.80 


-11.4 


9 


129.96 


1169.64 


-- 1.4 


14 


1.96 


27.44 


8.6 


15 


73.96 


1109.40 


! 


18.6 


4 
. 


345.96 


1383.84 


I ! 


28.6 


? 


2 


817,96 


. 


1635.92 1 1 


. 


50 


.. 


8602.00 


Variance 


8602 : 00 
Mean Square Deviation = 

50 


= 172.04 


Root Mean Square Deviation 


7172.04 
13.1 Kg . per day. 


There are lot of difficulties in the calculation by the above 
method since it involves squaring of big numbers involving deci 
mals and multiplication of square of such numbers by the frequen 
cies. Therefore, we should think of alternative shortcut method . 


B. Shortcut Method 


There are two shortcut methods. In the case of first method , 
we use an assumed Mean say A. and find out the deviation of 
each value from the assumed Mean . Afterwards, we calculate 
the standard deviation of the deviation themselves, and the standard 
deviation of the deviation is same as the standard deviation of 
the original values, 
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Shortcut Method I 


Let us assume 95.5 as A. The new table will be as follows: 


Mid values 


d = x -- 95.5 


f 


d . f 


d2 


df 


* 


( 
1 
) 


( 2 ) 


( 3) 


( 4 ) 


( 5) 


65.5 


- 30 


1 


30 


900 


900 


75.5 


- 20 


5 


100 


400 


2000 


1 


85.5 


10 


9 


90 


100 


900 


À - 95.5 


0 


14 


0 


0 


0 


105,5 


10 


15 


150 " 


100 


1500 


115.5 


20 


4 


80 


400 1600 


125.5 


30 


2 


60 


900 


1800 


Total 


70 


8700 


Let us take 95.5 as the arbitrary value or 4. Let a new 
variable d be calculated with the following formula . 


d = x - A = x 95.5 and these values are given in col. (2) 


Let us calculate the Mean and Variance of d . 


70 


d fm d ? 


d 


S 


1.4 


V ( d ) 


50 


n 


8700/50 


1.4 x 1.4 


174 


1.96 


= 172.04 


Standard deviation 


11 


1172.04 


13 :1 kg. per unit. 
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It can be argued in other way also . We have seen earlier 
that the Mean Square Deviation calculated from any arbitrary 
value will be greater than the variance calculated from the Mean 
and the difference will be equal to the square of the difference 
between the Mean and arbitrary value. 


Mean Square deviation about A : 

V ( x ) + ( x - A )? 
V ( x ) + ( 96.9 – 95. 5 )2 
V ( x ) + ( 1.4 ) 
V ( x ) + 1.96 
V ( x ) = 174 1.96 172.04 


Mean 


The above method of calculation with an assumed 
is called Changing the Base. 


Method II 


In this process we change not only the base but also the scale . 
We adopt the following substitutions. 


d 


x - A 

с 


Where A is an arbitrary value and C is the class interval. 
In this process we reduce each value to 1 / C th of the original value 
by dividing it by C. Let us calculate the standard deviation för 
the same continuous frequency distribution . 


Frequency 


Mid values 
( in kg .) 

Xi 


( 


mona 


65.5 
75.5 
85.5 
95.5 
105.5 
115.5 
125.5 


14 
15 
4 . 
2 
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After subtraction of the arbitrary values , the residual values 
are reduced as follows. Let us assume 95.5 as the arbitrary value . 
The value of C is equal to the class interval namely 10 . 
65.5 - 95.5 

-30 

= -3 
10 

10 


75.5 -- 95.5 

10 


--20 

10 


85.5 - 95.5 

10 


-10 


-10 
10 


11 


- 1 


95.5 --- 95.5 

10 


11 


0 
10 


= 0 


105.5 - 95.5 

10 


il 


10 
10 


= 


1 . 


115,5 – 95.5 

10 


20 
10 


= 2 


125.5 - 95.5 

10., 


11 


30 
10 


= 3 


The distribution of d will be as follows and afterwards the 
same method that we have adopted for the computation of stan 
dard deviation of d in the first method can be adopted here also . 
Thus we can find the standard deviation cf d . 


d 


f 


d2 


Xi 
( 1 ) 


fd 
( 4 ) 


fd ? 
( 6 ) 


( 2 ) ( 3 ) 


( 
5 
) 


3 


9 


9 


4 


20 


-10 
- 9 


1 


65.5 - 3 

1 
75.5 2 5 
85.5 

- 1 9 
95,5 0 14 
105.5 1 15 
115.5 

2 4 


9 
0 


0 


0 


15 


1 


15 


8 


4 


16 


125.5 


3. 2 


6 


9 


18 


Total 


50 


7 


87 


160 


NB 


: Col ( 5 ) can also be avoided . Col:(6 ) can be directly 

computed by multiplying col. ( 2 ) and col . ( 4 ) also . 
N 

Ef = 50 . 


I 


1 


Efd 


d 


7/50 


0.14 


N 


Variance of d = V ( d ) 


Σ fd ? 

-d2 
N 


.87 


11 


0.14 x 0.14 


50 


|| 


1.74 


0.0196 


Standard Deviation ( or) 


02 


11 


1.7204 


1 


1.31 Kg . 


We have already seen in the first method that the standard 
deviation of d is same as the standard deviation of x when 
there is only change of base . But in this process , we have changed 
not only the base but also the scale by dividing it by C which is 
equal to 10 in this example . 


Since each value of d is 1 / 10th of the corresponding of x . 
we can naturally expect that the standard deviation of d will 
also be equal to 1 / 10th of the standard deviation of x . In other 
words, the standard deviation of x will be 10 times ( C ) the stan 
dard deviation of d . 


Variance of x 

V ( x) 


= 100 x Variance of x . 

100 x 1 7204 


172.04 


Similarly standard deviation of x 0 = 10 x Standard 

10 x Standard deviation 
of d . :: 

10 od 


10 X 1.31 


13.1 Kg 
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In this method , the original values of x namely, 65.5 , 75.5 , 
85.5 etc. are reduced in size and they are replaced by --3, -2 , 

1, 0 , 1 , 2 , 3 etc. which are small numbers whose squares can 
be written without referring to any tables. Because of this , lot , 
of labour is saved in squaring and multiplication . 


Formula : Standard Deviation of x 


CX S.D. of d . 


O х = c xсx 


Method I : Let d = 


X 


A 


id = 


A 


d 


( x 


( ü - A ) 


d - 


11 


X 


- 


A - + A 


31 


X 


1 


X 


( x 


* ) 


(dd) ? 
£ ( d - d )? 


E ( x 


)2 


If we divide by N , 

Σ ( d 

( 
dd) 

d ? 


( x -- 


N 


N 


V (d ) = V ( x ) 


.. : 0 d 


: 0 d = Ox 


Variance of d 


Variance of x . 


. :: Standard deviation of d 


Standard Deviation of x , 


Because of the change in the base , the value of Standard 
Deviation does not undergo any change. 


Let us consider the change in the scale and in the base . 


Proof: 


Let d = 


1 


с 


1 


* 


A 
C 


S. I - 11 
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i 


:: d.... (*** ) - ( 4) 

- *-- 4 ** + 4 ( 73 
: r ( d --dje = ( *** ) 
I (d – d ) 

* = 5 ( 

-***) 


2 


2 


If we divide by N we get, 

( dd ) ? : (x - 7 ) 
N 

N. C2 


S 


Variance of d 


Variance of x . 

C2 


By the rule of cross -multiplication we get; 
C2 Variance of d Variance of ‘ x !. 


CV (d ) 


V ( x ) 


.. VC2 V ( d ) 
.. 

CXO ( d ) 


0 ( x ) 


RETROSPECT 


. ** Ås standard deviation occupies an important place in the 
study of statistics, it is better to have a retrospect of what we 
have studied . We have so far considered the computation of 
standard deviation for two types of data namely ( 1) Raw data 
or ungrouped datą ; (2 ) Grouped data or classified data or 
frequency distribution ." In the case cf " frequency distribution 
also , we have considered two categories namely (1) Discrete 
frequency distribution ; ( 2 ) Continuous frequency distribution . 
Thus three types of data : ( 1 ) Ungrouped data ; ( 2 ) Discrete 
frequency distribution and ( 3 ) Continuous frequency distri 
bution are considered . 
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Under each category , we have seen two methods ( 1 ) Direct 
methcd and ( 2) Shortcut method . In the case of continuous 
frequency distributions, two types of shortcut methods are adopted 
by using the substitution : 


A 


( i) d = x - A : 


( ii ) d 


C 


Now let us see one example wherein , the advantage of each 
method and the accuracy of the value of the standard deviation 
calculated can be seen . For this purpose let us examine the 
weight of 50 bags or bundles of some grain . This example has 
been considered ( 1 ) in the case of classification of data ; ( 2 ) in 
the case of calculation of Ariihmetic Mean of continuous frequency 
distribution ; ( 3 ) in the case of calculation of standard deviation 
of continuous frequency distribution . 


The weights of the bundles in kg . are given below : 


67 , 75 , 127 , 80 , 85 , 83 , 93 , 97 , 91 , 98 , 


98 , 94 , 102 , 100 , 102, 104, 105 , 105 , 103 , 102 , 


121 , 114 , 79 , 72 , 82 , 87 , 88 , 98 , 107 , 103, 


90 , 92 , 98 , 118 , 111 , 110 , 106 , 97 , 109 , 108 , 


107, 76 , 89 , 85 , 88 , 97 , 91 , 98 , 112 , 106 . 


1 


Method I 


Let us consider this series as raw data and calculate the 
Standard Deviation by the shortcut method . ie : without calcu 
lating the deviation from the Mean . We shall calculate the 
Standard Deviation from the square of the original values . 
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The working is as follows : 


x2 


X 


x2 


67 


4489 


87 


7569 


1 


75 


5625 


88 


7744 


127 


16129 


98 


9604 


80 


6400 


107 


11449 


85 


7225 


103 


10609 


83 


6889 


90 


8100 


93 ... 


8649 


92 


8464 


97 


9409 


98 


9604 


91 


8281 


118 


13924 


98 


9604 


111 


12321 


98 


9604 


110 


12100 


94 


8836 


106 


11236 


102 


10404 


97 


9409 


· 100 


· 10000 


109 


11881 


102 


10404 


108 


11664 


104 


10816 


107 


11449 


105 


11025 


76 


5776 


105 


11025 


89 


7921 


103 


10609 


.85 


7225 


102 


10404 


88 


7744 


121 


14641 


97. 


9409 


114 : 12996 


: . : 


si : 91 


.8281 


79 


6241 


98 


9604 


72 


5184 


112 


12544 


82 


6724 


106 


11236 
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Total of 


* = 4850 


Total of x2 


478479 . 


x = 4850 


-97 , 


x = 478479 ; N = 50 . 


V 


478479 

50 


97 x 97 


9569.68 -- 9409 = 160.68 


o 


= V160 :68 = 12.7 kg. per head . 


Method IT 


Let us construct the frequency distribution : 


Class 


Frequency 


1 


65.5 - 75.5 
75.5 -- 85.5 
85.5 ~ 95.5 
95.5 - 105.5 
105.5 - 115.5 
115-5 - 125.5 
125.5 - 135.5 


5 
9 
14 


15 . 


. 


4 


2 


_ 


50 


The calculation of Standard Deviation from the frequency 
distribution was also given under continuous distribution . We 
have calculated the Standard Deviation as 12.7 . Now , 
compare the result as follows : 


let us 


From ungrouped data from grouped data 


1. Mean 


96 9 kg 


1 


9.7 kg . 
160.68 kg 


2. Variance = V 


172 : 04 kg 


3. Standard Deviation 


12 : 7 kg . per head 


13.1 kg . per head 
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Though there is not appreciable difference in the value of 
Mean calculated in both the cases, their differences in the value 
of the variance and the standard deviation are appreciable. 


It is seen that the values of variance and Standard Deviation , 
calculated from the grouped daia are greater than those calculated 
from the ungrouped data . Further, we also know that the mea 
sures calculated from the ungrouped data are correct since it does 
not involve any assumption . But in the case of grouped data , 
one assumption is involved, namely that all the members in a 
particular group or class are having weight or value equal to the 
mid -value of that class . Because of this assumption , we find 
the difference in the Mean calculated from the grouped data . 
Similarly we find differences in the value of V and o calculated 
from the grouped data . 


We know that the assumption is not correct and also 
that the difference in the value is due to the assumption 
which arises out of the classification of data . The number of 
classes also depends upon the class interval. Hence the difference 
is due to the class interval. Therefore, if we want to know the 
correct value of Voro , we have to apply a correction factor based 
on the class interval. 


Sheppard s correction 

The correction to be applied is known as Sheppard s corre 
ction . It will be equal to C4 / 12 where C is the class interval . 
Since the value of variance obtained from the grouped data will 
have an upward tendency, the correction factor has to be sub 
tracted from the variance obtained from the grouped data to get 
the correct variance of the data . 


C = 10 . 


Correction factor C2 / 12 = 100/12 = 8.33 . 
Variance obtained V ( x ) = 02 172.04 . 
. : . Corrected variance = 172.04 – 8:33 


163.71 


Standard Deviation 


7163.71 


12 : 8 
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The correction factor has further reduced the " différence . 


For ungrouped 

data 


For grouped 

data 


Corrected 
is value 


! 


} 


: 


Variance 


( V ) 


160.68 


172.04. 


163.711 


Standard 


( 0 ) 


12 : 7 


13 : 7 


128 


Deviations 


The above method is more or less similar to the one consi 
dered earlier and the only difference is that we have considered 
the mid - values instead of all the raw data . But in this , consi 

calculation in squaring and multiplication is involved . 
This can be avoided if we take an arbitrary value and compute 
the variance . 


Characteristics of the Frequency Distribution and the frequency 

Curve 


Mean and Standard deviation are used as tools for comparing 
different distributions. Mean indicates the closeness of the 
values with one another while the standard deviation indicates 
the dispersion of the values from one another . For comparison 
of different distributions, it is not necessary that the total number 
of units in the different distributions should be equal. However, 
we can make the total number of units in each distribution equal 
by converting each frequency into percentage so that the total 
of all the frequencies will be equal to 100 in the case of all the 
distributions. By this method we can indirectly make the total 
frequencies of all the distributions equal. Further, the percentages 
of the frequencies of different classes can be expressed in terms 
of probabilities also so that the total of the probabilities of all the 
classes will be equal to 1 for all the distributions. In this respect 
also the total number of units in different distributions can be 
made equal. 


We have already studied that we can draw frequency curve 
for frequency distribution . Therefore, different frequency distri 
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butions can be compared by means of their frequency curves. 
The area of the curves will depend upon the total number of units 
in the distribution . As the total number of units differ from 
distribution to distribution the area of the curve will naturally 
differ from curve to curve . 


However , if we consider the percentages of the frequencies 
or probabilities instead of either the actual frequencies or the 
actual number of units in different classes, we can make the area 
of the different curves equal since in all cases the total area will be 
equal to 100 or 1 since the total of all probabilities will be always 
equal to 1. This curve will be called probability integral curve . 


We have shown that the different values will be distributed 
around the Mean . Some of them will be less than the Mean and 
some others will be greater than the Mean . However, the area of 
all probability curves will be equal since the total area represents 1 . 


But the shape of the curves may differ either in the height 
or in the width . Because the area is constant, as the height 
increases the width or the spread will decrease and vice versa . 
As the spread or dispersion decreases, the variation among the 
values of the different units decreases. If all the units have equal 
values, they will be equal to the Mean , then the curve will be 
a straight line erected at the value equal to Mean . The width 
of the curve can be divided into six divisions. three on either 
side of the Mean . The values of portions on the left -hand side 
of this Mean will be less than the Mean and the portion on the 
right hand side of the of the Mean will be greater than the Mean . 
In order to make the comparison more effective , each division 
on either sid , can be equal to 1. But the width of each division 
though considered as equal to 1 in all cases will differ in different 
curves depending on the value of the standard deviation of the 
concerned distribution . Generally numbers are started from O. 
Hence the value of the mean is taken as O and portions on left 
hand side are denoted by - 3 , -2 , --1 and in the right hand side 
are denoted by +1 , +2 , +3 . This shows that the curves are distri 
buted from 3 to +3 . with O as the Mean in all cases . Nor 
mally the curves will be a well bell shaped one and such 
curves are called Normal Curves 
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MOMENTS OF A FREQUENCY DISTRIBUTION 


The main characteristics of a frequency distributicn are 
expressed by certain constants called Moments which are calcu 
lated from the frequency distribution . Moments can be calculated 
from any arbitrary point. The moments calculated from the 
Arithmetic Mean are very important and they are 

called the 
Central Moments . We can also calculate the moment from 
the origin namely O and they are called Raw Moment . Moments 
of any order; can be calculated . The moments are denoted by 
the letter !!. The different orders of the moments are denoted 
by P , M , M , L , and so on . The rth Central Moment will be 
denoted by Hr , and rth Raw moment will be denoted bylle r . 
rth Central moment to 

( x - 2 ) " fi 

N 


rth Raw Moment l , 


( 11 – Orfi 

N 

( xi — 5 ) f = 0 
pl 

N 


First Central Moment 


First Raw Moment 


le 


mi 


N 


11 


The Second Central Moment : 


Hus 


N 
V o ? Variance , 
£ x ;? fi 

N 
& ( x; -- * ) fi 

IN 
x fi 

ge ? 
N 
Mere l 


po 


Σ Χ ; 


11 


i Second Central Moment 


Second Raw Moment Square 
of the First Raw Momenti 
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In this manner , the 3rd and 4th Central and Raw Moments 
can be easily calculated. 


pl 


= ( x – % ) fi 

Ņ .. ::: ; - .. 
Let us consider the expansion of (x — 7 ) 

x - 3x2 + 3x - ** 
$ { * -- * )? = £ x * – £ 3x + 1.3x x? 
= 3x - 3x ax + 37 Ex - ** 


3 


! 


= - 3x +33- FS 


= * - 3. & x * + 2 * . * 
Dividing by " N ” we get M s - 3MM ; + 2M , 


3 


It may be seen that the Central Moment is expressed in terms 
of the Raw Moment. :: In this manner the fourth Central Moment 
can also be written in terms of the Raw moment as follows: 


4 


} (x-— )* fitvns vi 

N 
” . = Me , - 4 ild s M + 6.L alla 

l - 34 

3 , 


4 


} 


SKEWNESS OF A FREQUENCY DISTRIBUTION 


Mean and Standard deviation tell us about two important 
aspects of a frequency distribution . They are : ( 1 ) the central 
value and ( 2 ) the concentration of values around the central value . 
Another aspect of the frequency distribution is skewness. 


A distribution can be classified as symmetrical or nonsym 
metrical. It is based on the frequencies. A distribution is seem 
to be symmetrical when the frequencies are symmetrically or 
equally distributed at equal intervals, on either side of the Mode. 
In other words, frequencies at equal intervals on either side of 
the Mode are equal : 
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Let us examine this with the help of a distribution . 


Value 


Frequency 


10 


1 


20 


2 


30 


4 


40 


6 


50 


10 


60 


6 


70 


4 . 


80 


2 


90 


1 


Mode of the distribution is 50 . 


The values 40 and 60 are at equal distance ( 10 ) from the 
Mode on its either side . Similarly, the values 30 and 70 are at 
equal distance or interval ( 20 ) on either side of Mode. The values 
20 and 80 are at equal distance ( 30 ) on either side of the Mode. 
So also 10 and 90. We can form the following characteristics 
from the frequency column . 


The values 40 and 60 which are at equal distance from the 
mode are having same frequency 6. In the same way the values 
30 and 70 are having the same frequency 4. The values 20 and 
80 are having the same frequency 2. The values 10 and 90 are 
having the same frequency 1. The values which are at equal 
distance on either side of the Mode are having equal or same 
frequency . This is called symmetrical distribution . 
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The curve of a symmetrical distribution , 

The curve of a symmetrical distribution will be as follows. 
It will be perfectly bell shaped curve . 


yu 


10 


6 


6 


41 


2 


2 


0 


10 


20 


40 


50 


60 


70 


80.90 x 


Fig . 16 
Symmetrical Curve 


p4 


Any curve of the above type is called symmetrical curve and 
the distribution of it is said to be symmetrical. Any departure 
from symmetry is known as skewness and the distribution is said 
to be skewed . 


: 


Properties of symmetrical distribution 

1. In the case of symmetrical distribution the values of Mean 

and Median will coincide with the value of Mode . In 
other words, Mean , Median and Mode will be equal. 
Let a represent mean 

O 

; 
M. representMedian 
Z represent Mode. 


2. Median will lie on the central point between the lower 

( Q ) and upper ) Quartiles. The distance or 
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difference between M and Q , will be same as the distance 
or difference between l , and M. 

M = M - Q , 


Q : - 


3. The sum of the positive deviations from Median will be 

equal to the sum of the negative deviations from Median . 


4. The shape of the curve will be a perfect bell. 


Skewness 


The departure from the symmetry of a frequency distribution 
is known as skewness. For a perfectly symmetrical distribution 
all the odd central moments namely first, third , fifth etc. will 
be equal to O . 


For moderately symmetrical distribution the B , co -efficient 
will be a small positive quantity. For considerable departure from 
symmetry B , will be large . Hence B , co -efficient can be taken as 
a Measure of Skweness . 


We know that Be 


U 2 

8 
U 

3 


:: WBL 


8 


B 


13/22 


مل 


The skewness may be either positive or negative depending 
upon the sign of Hooge 


: 


Positive skewness 


In a positive skewed distribution , the following characteri 
stics will be noticed : 


1. The number of items on the right side of the highest 

ordinate ( height) of the curve will be more . 


2. Median value will be greater than Mode. 


3. Mean will be greater than Median . The ascending order 

of the values is ( 1 ) Mode, ( 2 ) Median , ( 3 ) Meäfi. 
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.. 4. The frequency curve will have a long tail at the right 

side . 


1 


Yу 


0 


Z MA 


FIG . 17 
Positive Skewness 


Negative skewness 

The following characteristics can be noticed in the case of 
negative skewness. 


a a E MEAN 
M MEDIAN 
Z MODE 


a M 


N 


X 


FIG . 18 
Negative Skewness 


1. The number of items on the left side of the highest ordinate 

( height) will be more. 
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2. Median will be greater than Mean . 
3. Mode will be greater than Median . The ascending order 

of the values is i . Mean ; ii. Median ; iii. Mode. 
4. The frequency curve will have a long tail on left. 


Coefficient of skewness 


ni 


In the case of symmetrical distribution Mean , Median and 
Mode will coincide. Therefore, in the case of skewed distribution 
they will not coincide . Hence their difference can be taken as 
a measure of Skewness. 


Measures of skewness : Mean 


Mode. 


We have seen that Mean will be greater than Mode in the 
case of positive skewness. Therefore , the difference between 
the Mean and Mode will b , positive . In the case of Negative 
skewness, Mode will be greater than Mean . Hence , Mean - Mode 
will be negative. 


Relative measure of skewness 


In addition to B , co -efficient, a second measure of skewness 
is co - efficient of skewness denoted by C which is as follows : 

Mean Mode 
C = 

S.D. 


The above formula can be revised as follows : 


C = 


3 (Mean - Median ) 

S.D. 


This is known as Pearson s co - efficient of skewness . Since 
we have already seen that the difference between the Mean and 
Mod will be equal to thrice the difference between Mean and 
Median , the second formula can be used . When there is perfect 
symmetry, Mean ; Mode and Median will be equal to one another 
and consequently C will be equal to O . If the mean is greater 
than the Median , or Mode, skewness sill be positive. If it is less 
than Median or Mode, the skewness will be negative. 
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1 


3. A third co -efficient of skewness is given by the formula : 

lg te 
+ 2M 

2 


We know that lg - M 


M - Q 


The difference of these will be expressed as a ratio to their 
sum . 


Difference 


( 23 -- M ) - ( M.- Q ) 
QM- M + Q 

9 , + , - 2M 
( M ) . + (M - Q ) 


Sum 


= 


Q. 


Di 


29 . 


(Q - M ) -- ( M ) 
( Q - M ) + ( M - 0 , ) 


8 


11 


, + ļi - 2M 

Qo - e , 
l , + - 2M 

20 


1 


This measure is known as Bowley s measure of Skewness. 
For a symmetrical distribution the distance of the Median from 
the Lower quartile and upper quartile will be equal ( M - Q ) 
( Q - M ). Therefore, the difference between these two quar 
tiles from the Median divided by the Inter Quartile Range ( Q - Q ) 
is taken as a measure of skewness . 


KURTOSIS 


The flatness or peakedness of a frequency curve is knowu 
as Kurtosis . It depends upon the number of items near the Mode . 
We find the flatness or Pakedness of curve only with reference 
to Normal curve . Normal curve is an ideal symmetrical curve . 
Therefore, measure of Kurtosis will tell us how far a particular 
frequency curve is nearer to or away from the normal curve or 
how far the given frequency conforms to an ideal normal curve . 
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Among the various measures of Kurtosis the percentile measure 
of Kurtosis is the simplest. This is obtained by dividing the 
| difference between the 90th percentiles and 10th percentiles by 
quartile deviation . 


Percentile -Measure of 

Kurtosis 


Difference between 90th and 10th 

percentiles 
Quartile Deviation . 


P 


- P 


10 


Me 


2 
2 


Kurtosis is defined as 
M J B4 and it is 
ie 

Q : - Qi 
02 

2 . 
For normal distribution B2 = 3 = 

2 (P20 - P10 

ls -li 
If ß , > 3 the peak is sharper and if ß < the peak flatted . 


For an ideal normal curve , the percentile measure of Kurtosis 
is equal to 3.8 . 


1. Meso Kurtic : Any distribution with percentile measure 

of Kurtosis equal to 3.8 the curve is called Meso kurtic . 


Yl 


I. MESOKURTIC 
2. LEPTOKURTIC 
3. PLATI KURTIC 


2 


3 


х 


FIG . 19 


Kurtosis 


S. I - 12 
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2. Lepto Kurtic : If the measure of percentile kurtosis of 

à curve is greater than 3.8 , the curve is said to be Lepto 
Kurtic has more peak at the top than the normal curve . 


3. Platy Kurtic : If the percentile measure of Kurtosis of 

any distribution is less than 3.8 , the frequency curve of 
the distribution is said to be platy kurtic or more flat 
at the top than the normal curve . 


LORENZ CURVE 


Lorenz curve is another method of studying the dispersion 
by means of graph . It is calculated from the cumulative frequency. 
In addition to the cumulative frequencies, we use the cumulative 
values also . The various steps involved are as follows: 


( 1 ) We should find the cumulative values. These values 

should be expressed in percentages. 
(2) ** We should find the cumulative frequencies. The 

values should be expressed in percentages. 


100 


( 100 , 100 ) 


80 


60 


40+ 


20 


459 


0 


20 


40 


60 


00 


100 


FIG . 20 


Lorenz Curve 


( 3) The percentage values of the cumulative values should 

be plotted on the X - axis. 
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( 4 ) The percentage values of the cumulative frequencies 

should be plotted on the Y -axis . 
( 5 ) 

In both the cases , the minimum value is O and the 
maximum value is 100 . If we join these points, 
the minimum and the maximum , will be a straight 
line from the origin towards the opposite point . It 
will be a diagonal line . 


1 


Merits 

It indicates how far the distribution has deviated from the 
average position. The average position will be indicated by the 
diagonal line joining the points with co -ordinates, probably ( 0,0 ) 
and ( 100,100 ). 


Draw the Lorenz curve for the following distribution : 


No. of workers 


Total wages per mensem 

Rs . 


3 


2000 


7 


3000 


12 


4000 


15 


5000 


13 


6000 


50 


20000 


No. of workers 


Cumulative 
frequency 


Percentage of 

C.F. 


3 


3 


6 


7 


10 


20 


12 


22 


44 


15 


37 


74 


13 


50 


100 
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Wages ( Rs ) 


Cumulative wages 


Percentage of wages 


2000 


2000 


10 


3000 


5000 


25 


4000 


9000 


45 


5000 


14000 


70 


6000 


20000 


100 


The percentages of the cumulative values of frequency are as 
follows : 


. 


Percentages of 
cumulative workers 


Percentages of 
cumulative wages. 


.6 


10 


20 


25 


44 : 


45 


77 


70 


100 


100 


1 


The above figures can be plotted on the X and Y axes as follows : 


X 


: 


20 


44 


77 


100 


Y 


: 10 


25 


45 


70 


100 
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Plot the points and join them by a smooth curve. From 
the curve , we can also find out wages of above 25 % or 
50 % of the workers. From them draw vertical line at points 


100 


807 


60 


40+ 


20 


20 


40 


60 


80 


100 


FIG . 21 


Lorenz Curves 


corresponding to 25 % and 50 % to cut the curve . The Y value 
of the point of intersection on the curve will give the percentage of 
of 25 % of workers and 50 % of the workers . 


Exercise 


( 1 ) Calculate the Quartile deviation for the following data . 

(Marks) 


52 , 55 , 57 , 49 , 54 , 61 , 64 , 58 , 63 , 61 . 


(2) . Calculate the semi inter quartile range . 


20 


30 


40 


50 


60 


70 


No : of students 5 


7 


20 


8 


4 


6 


( 3 ) Calculate the Mean deviation for the following data . 


132. 104 , 166 , 143 , 175 , 158 , 179 , 189 , 125 , 140 . 
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( 4 ) Calculate the Standard deviation and ac efficient of 

variation for the following data . 
( 1 ) 28 , 39 , 45 , 60 , 65 . 
( 2 ) 32 , 53 , 49 , 28 , 75 . 
( 3 ) 280, 580, 350 , 750 , 625 
( 4) Calculate the S.D , and Coefficient of variation. 


x 


f 


X 


f 


(i) 25 


3 


( ü ) 10 


5 


35 


4 


20 


8 


45 


5 


30 


9 


55 


8 


40 


7 


65 


5 


50 


5 


60 


6 


( 5 ) Calculate the S.D and 


Coefficient of variation , 


X 


f 


f 


( i) 


0 10 


4 


( ii) 


0 -- 25 


11 


10 - 20 


7 


25 – 50 


15 


20 – 30 


8 


50 - 75 


28 


30 — 40 


9 


75--100 


12 


40 -- 50 


5 


100-125 


14 


50 - 60 


7 


CHAPTER III 


CORRELATION 


We have so far considered series having only one variable 
or one characteristic, for example, the weight of a person , or 
height of a person or the marks obtained by a student or the yield 
obtained from a plot etc. But in actual practice we may have 
to consider simultaneously more 

than 

variable or 
characteristic at a time, for example, the height and weight 
of a person or quantity of fertiliser applied and the quantity 
of yield obtained . Sometimes each item may have three or more 
variables. 


one 


Correlation 

The values of different variables may be inter - related . For 
example , the weight of a person may depend on the height of 
a person , or the height and weight of a person may depend 
upon the age of a person . The quantity of yield obtained may 
depend upon the quantity of fertiliser applied . The relationship 
between two or more variables is called the correlation and the 
variables are said to be correlated . Sometimes the relationship 
is also called as covariation. 


Relationship 

The term relationship can be used in three senses , namely 
( 1 ) mutual relationship ( 2) cause and effect relationship and 
( 3 ) general relationship . 


. 


Mutual relationship 

The price and demand of a commodity have mutual relation 
ship when the price of a commodity decreases, the demand of 
ît may incr zase . Sometimes, when the demand for it increases, 
the price may also increase: Whenever thore is a change in the 
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value of one variable there will be a change in the value of the 
other variable also . The changes in the values mutually depend 
upon the changes in the value of each other. 


2. Cause and Effect 


Whenever the quantity of fertiliser applied increases, the yield 
may also increase . In this case the cause for the increased effect 
in the production is the quantity of fertiliser applied. Increased 
rainfall may cause increase in production . In these cases the 
relationship is cause and effect . 


3. General Relation 


The production of paddy may increase with the increase 
in the production of cotton . They may not have any direct re 
lationship . However , the increase in rainfall may cause increase 
in the productiсn of paddy, cotton and other agricultural commo 
dities. In these cases , the production of cne dces not depend 
upon the other. Hcwever, they depend upon the increase in 
rainfall. Such relationship which has no inter -relationship , 
but however has a general relationship , with some other charac 
teristics is called General Relationship . 


But at present we shall consider only two characteristics 
for the sake of simplicity and easy understanding. Out of the 
two characteristics one will be considered as an independent vari 
able while the other will be treated as a dependent variable. From 
the names given it may be understood that the value of the dependent 
variable depends upon the value of th ; independent variable . 
It may be clear that any variation in the value of the independent 
variable will alsс have an impact on the variation in the valu : 
of the dependent variable . If any such relationship exists in 
the changes of the values of the variables we can say that they 
are corretated . The independent variable will be denoted by the 
letter x and the dependent variable by the letter‘y . 


Types of correlation 

There are two types of correlation pamely positive and nega 
tive correlationi 
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Positive correlation 


Whenever the value of the independent variable increases, 
the value of the dependent variable of the corresponding unit will 
also increase or when there is a decrease in the value of the inde 
pendent variable , the value of the dependent variable also decreases . 
In this case the changes in the values of the two variables are 
taking place in the same direction i.e. either both will increase 
or both will decrease simultaneoulsy . This is called positive 
correlation . Price and supply of a commodity can be best example 
of this type . Whenever the price of a commodity increases, thu 
supply of the commodity will also increase or whenever the price 
of a commodity decreases the supply position will also decrease . 


Negative correlation 


On the other hand if the value of the dependent variable 
decreases when there is an increase in the value of the independent 
variable, or if the value of dependent variable increases when 
there is a decrease in the values of the independent variables, the 
changes in the values of variables are taking place in the opposite 
directions to one another and hence it is called Negative C rre 
lation . Price and demand of a commodity can be cited as an 
example. Whenever the price of the commodity increases, its 
demand may decrease and vice -versa . 


Perfect positive correlation 

The relationship between changes in the temperature and 
in the length of an iron bar can be a suitable illustration for perfect 
positive correlation . We know that the length of the iron bar 
will increase as the increase in the temperature and thus it indi 
cates a positive correlation . But we also know that for every 
one degree rise in temperature , the length of the iron bar increases 
by a fixed length . This shows an uniform increase in the value 
of the dependent variable for a uniform increase in the values 
of the independent variable . We can otherwise say that the rates 
of changes in the values of both the variables are equal and 
this shows one to one correspondence in the rate of changes. 
Because of this one to one correspondence in the rate of changes 


186 


it can be said to be a perfect positive correlation and it can be 
indicated by +1 . On the other hand if a similar but opposite 
rate of changes takes place in the value of the dependent variable 
corresponding to the rate of changes in the value of the indepen 
dent value it will be called a perfect negative correlation and it 
can be indicated by - 1 . The volume of Gas at constant tempera 
ture decreases in a definite ratio when the pressure increases 
becaus : ‘PV is always constant. This shows that the indicator 
for correlation may vary between -1 to +1 through O . 
If there is no correlation between the value of any two variables, 
we can say that indicator of the correlation is O . 


Correlation can be studied by any one of the following methods. 

1. Scatter Diagram ; 2. Correlation graph ; 3. Karl Pear 
son s Coefficient of Correlation . 


: 


The first two are graphical methods, and from these we can 
find out the nature of correlation . We can find out whether the 
correlation is positive or negative. i . e . Qualitative assessment 
of the correlation can be given but it cannot be expressed in quanti 
tative measure from the graphical method. The coefficient of 
correlation gives a quantitative measure of the distribution . 


: 


Scatter Diagram and Correlation graph 


Scatter Diagram 


It is the simplest method of studying correlation between 
two relative variables. The type of correlation presents either 
positive or negative can be obtained with great ease from the 
diagram . The various values corresponding to each pair of x 
and y can be plc tted in a xy - plane. We will find that the various 
points corresponding to each unit will be scattered through out 
the XY plane and this Diagram is called Scatter Diagram . Since 
this diagram depicts the values of two variates it is also called 
a bi - variate diagram . If the points tend to cluster themselves 
along well defined curves , the curves are call : d regression curves . 
In such cases an association between the 2 variable is suggested . 
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If the regression curves are straightlines we say the regression 
is linear . 


1 : 


у 
18 


16 


12 
11 


( 8,11) 


8 


6 


4 


2 


ह 


4 


6 


8 


10 


12 


14 


X 


FIG . 22 


T 


Scattered Diagram 


1. Positive Perfect Correlation 


If all the plotted points or dots form a straight line running 
from left to right in the upward direction , the correlation is said 


V 


X 


FIG . 23 


Porfect Positive Correlation 


to be perfect positive . The graph will be as given in the figure 
above : 
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1 


2. Positive Correlation 


If the points or dots are scattered around a straight line 
running from left to right in an upward direction, instead of all 
lying exactly on a straight line as explained before, the correlation 
is said to be positive. The figure will be as follows: 


Y 


УІ 


Ý 


X 


* 


1 


х 


+ 


X 


* 


X 


+ 


+ 


X 


X 


0 


X 


Fig . 24 


Positive Correlation 


3 , Perfect negative correlation 


If all the points or dots in a scatter diagram form a straight line 
running from right to left in a downward direction, the correlation 
is said to be perfect negative . The figure will be as follows. 


X 


Fig . 25 


Perfect Negative Correlation 
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4. Negative correlation 


If the dots or points instead of all lying on a straight line as in 
the above figure , but scattered around a straight line , the 
correlation is said to be Negative. The figure will be as follows: 


+ 


t . 


X 


* 


X 


X 


х 


XX 


x 


X 


x 


* 


X 


Fig . 26 


Negative Correlation 


5. No correlation 


: 


1 


If the plotted points do not form a straight line but lie all 
over the plane as in the following figure, it will indicate the absence 
of any correlation . 


3 


FIG . 27 . 


No Correlation 
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Nature of correlation 

The nature of correlation can also be determined from the 
scatter diagram by another method . 

We can divide the total values of all xs and ys and find 
out their respective averages and y . We can also plot the point 
corresponding to the value w and y in the scatter diagram . By 
drawing two perpendicular lines , one to the x - axis and another 
to the maxis, through the point corresponding to f and ŷ and 


-axb = -ab 
( SECOND ) 


axb = ab 
(FIRST ) 


-ax - b = ab 
( THIRD ) 


-ax - b = -ab 
(FOURTH ) 


FIG . 28 


extending the lines on either side of the point, we can divide the 
scatter diagram into four parts. Each part is called a quadrant. 
The quadrants are called as first second , third and fourth quadrants. 


14 


12 


ro 


( 6,11 ) 


2 


O 


2 


4 


6 


8 


10 12 14 16 


FIG : 29 
Scattered Diagram 


The number of points present in each quadrant should be 
counted . If the total number of points in the first and third 
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quadrants is greater than the total number of points in the second 
and fourth quadrants, it will indicate positive corr.lation . If 
the total number of points in the first and third quadrants is less 
than the total number of points in the second and fourth quadrants 
it will indicate negative correlation . 


Correlation Graph 


Correlation graph is used where variables are given with 
reference to a period of time. Time is marked on x axis and 
the values of the variables are marked on the y axis. Let us 
consider the following table and the graph drawn for it . 


Year 


Income 


Expenditure 

Rs. 


Rs . 


1970 


100 


90 


1971 


105 


95 


1972 


115 


100 


1973 


125 


115 


1974 


150 


120 


1975 : 


178 


150 


1976 


190 


175 


1977 


200 


190 


1978 


205 


200 


192 


210 


2004 


180 


180 


RUPEES 


INCOME 


EXPENDITURE 


140 


1207 


100 


80 
1970 


71 , 


72 


73 


74 


75 


76 77 78 


FIG . 30 


Correlation Graph 


A = Income curve ; B - 


Expenditure curve . 


The income and the expenditure are marked on Y - axis. 
All the points representing the income may be joined by means 
of a straight line. Similarly all the points representing the expen 
diture may be joined by means of a straight line which may be 
distinct from the straight line representing the income. 


If the curves of the two variables are very close to each other, 
and if they move in the same direction , thu variables are said to 
be positively related . On the other hand , if the curves of the 
two variables move in opposite directions, the variables are said 
to be negatively correlated . In the above example the curves 
for income and expenditure move in the same direction i.e. there 
is a rise in the height of one curve at a particular point of time 
corresponding to a rise in the other curve at that point of time, 
or there is a decline in one curve corresponding to a decline in 
the curve at a particular point of time. Hence the two variables 
are positively correlated . . Correlation graph gives only an approxi 
mate idea of the correlation in the variables and it does not indicate 
the magnitude of the relationship . 
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A correlation graph can be drawn for the following data 
and then their relationship studied . This can be attempteid as an 
exercise by the students . 


Year 


Export 


Import 


( Rs . in crores ) 


: 1970 


100 


80 


1971 


150 


I 


70 


1972 


175 


65 


1973 


200 


60 


1974 


250 


50 


1975 


300 


45 


3. Coefficient of Correlation 

Coefficient of correlation is calculated to study the extent 
or intynsity or the degree of correlation exists between two vari 
ables. Correlation coefficient gives the degree of correlation 
in quantitative terms. Karl Pearson s coefficient of correlation 
is estimated by the following formula . 

> (x - 8 ) ( –ỹ) 

Nox X Oy 


1 


The abc.ve formula is indirectly based on certain assumptions 
given below : 


i . The correlation between the two given variables is assumed 

to be linear . 


ii . The forces affecting the two variables are assumed to 

be related to each other in a relationship of cause and 

effect. 
iii . The various causes affecting the two variables are common 

to both . 
S.II - 13 
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In the above formula . 


1. pos denotes the coefficient of correlation . 


2. x and y are pair of values representing the two variables 


x and y . 


3. Že is the Mean value of the x variable . 


4. y is the Mean value of the y variable . 


5. 0 - x is the standard deviation of the x values 


( x - 2 ) 


n 


6,0 y is the standard deviation of the y values 


V I( 


£ (y - 3 ) 2 


} 


n 


7. n is the number of items or the number of pairs . The 

above formula cn simplification may undergo the following 
1 changes and we can have three more types of formulae . 

( x - 2 ) ( y - y ) 
( 1 ) r = 

NO X X Oy 


. 


In the above formula on and oy can be substituted by their 
respective formulae . The formula will undergo changes as follows: 
( 2 ) (x - 2 ) ( y - 7 ) 

(y – 5 ) 
E ( x – Jo )2 
N 

Х 
V 
N 

N 
( x - 2 ) (y - y ) 
( 3) 
N £ (x – 5 )2 (y – 5) 

N2 


V x £ = 


Σ (( x -- * ) (y -- y ) 


11 


] 


NVE (x − 2 ) (y --- y ) 2 

N 
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( x --- 3 ) (y - y ) 
( 4 ) 

VE (x - ** ( y - y) 2 
Generally E ( x – 5 ) (y- y ) is called the product mo nent 

(x - 3 ) (y - ) 
denoted by P , and 

is called the Mean 

N 
Product Moment. 


ܕܕ 
P 


σκ . σκαι 


Let us examine this with the help of one example. Calculate 
the coefficient of correlation for the following data . 


X 


y 
Height in cms. 


Weight in kg . 


5 


7 


4 


6 


6 


9 


3 


· 5 


2 


3 


20 


30 


30 


The various steps involved in the computation of Smare 
enumerated below : 


Section I 

i . Find the number of pair of items n 


5 . 


20 


ii . 


Find the Mean ( w ) of the x values 


= 


4 kg . 


5 


Σ και 


1 


४& 


= 


20 
5 


4 


n 


196 


i 


... 30 


iii . Find the Mean ( 7 ) of the values 


6 kg . 


5 


Σ » 


:11 


lli 


30 
5 


= 6 


.N 


Section II 


i . Find the deviation (x– of each of the x - values from 

the Mean 


ii. In the same way find the deviation ( Y --- y ) of each of the 

Y -- values from the Mean, 


111 . 


The product of the deviation of x from its mean and 
the deviation of y from its mean may be calculated . The 
deviation of first value of x may be multiplied by the 
deviation of the first value of y , and the second by the 
second value and so 


on . 


( x - 7 ) 


( x - 3 ) ( -y) 


+1 


1 


1 


0 


0 


0 


2 


3 


6 


-1 


-1 


1 


-2 


3 . 


6 


1 


0 


0 


14 . 


( x − 3 ) ( y – 5 ) 
The product moment 


n 


14 


Il 


14 
5 


2.8 
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Section III 
1. Find the standard deviation of x . 




E ( x - 2 ) 

N 


( x - 5 ) 


( x– x2 


1 


1 


0 


0 


2 . 


4 


-1 


-1 


1 


---- 2 


4 


0 


10 


Ox 


Jos 


10 


!! 


2 


5 


2. Find the standard deviation of y : 


& ( y — y )2 

N 


( y - y ) 


( y - y) 


1 


1 


0 


0 


3 


9 


--1 


1 


- 3 


9 


20 


су 


20 


14 - = 2 


5 


Product of Ox . oy = v2 x 2 
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Section IV 


Divide the Product Moment by the product of ox . Oy 


2.8 
2 x 2 


2.8 

= 1. ( approximately ) 
1,4 x 2 


The various procedures can be summarised in the following 
table : 


1 


X 


y 
( 2 ) 


( x ~ F ) ( x ) ( my). (3 — j) ( x - 7 ) ( y - 7 ) 
( 3) ( 4 ) 

( 6 ) 

( 7) 


( 
1 
) 


5 


7 


1 


1 


1 


1 


, 1 


4 


6 


0 


0 


0 


0 


0 


6 


9 


2 


4 


3 


9 


6 


3 


5 . 


-1 


1 


-1 


1 


1 


그 


2 


3 


-2 


4 


9 


1 


6 


Total 20 


30 


0 


10 


0 


20 


14 


Mean 4 


6 


0 


2 


0 


4 


2.8 


It may be seen that columns 3,4,5 and 6 are devised to calcu 
late the standard deviation of x ( ox ) and standard deviation of 
y Coy ). 

In this process we adopt the basic formula for 
the calculation of standard deviation . 


( 1 ) OX 


(x - 3 ) 
N 


( 2 ) Oy = 


V 


ΣI (y - 3) 2 


N 


But we know that the standard deviation can be calculated 
by . shortcut method by using the following formula . We can 
calculate the standard deviation from the original values them 
selves, 


V 


m 2 2 
( 2 ) Oy 


N 
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For this we have to calculate the xa and y2 columns and 
these two columns may replace columns 3 , 4 , 5 and 6 . 

The x * values can replace the columns containing ( x - 7 ) and 
( x - 3 ) 

Similarly the ja values can replace the columns containing 
( -y) and ( y ) 

The 7th column giving the value of (x - 7 ) (y-- ) ) can be 
conveniently changed by another column giving the product value 
of x and y ( xy ) so as to facilitate the working. Let us see how 
this is possible. 

7th column = ( x -- F ) ( y --- ; ) 
The total of col. ( 7 ) = ( x - 2 ) ( y - y ) 
Here ū and ŷ are constant ; 

Σ xy - yΣ * -- Σ γ + Σ . 
= Xy -- y.N - 3 Ny + E * 
= xy – N = y – N + 8+ N. ** 
xy N1 
( x -- * ) ( y -- y ) 


N 


V 


E x ? 
N 


2 


VI 


ya 
N 


y ? 


$ xy – N < 

N 
V (8 x — N = ) ( ya — N 5 ) 

N2 

Sxy - Nw.j 
N (& x * - N ) (Ey - N 52 ) 

N2 


( 


(3 ** 


= xy - NF 


W & x — NX ) (Eya Nj ) 


Exy -- Ný 


Or 


( xx -N = 5 ) (Eyy - Nyy ) 
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The above expansion makes the position clear . Each factor 
of the denominator is identical with the numerator. 
The only difference is that in it first factor in the denominator we 
have substituted the values of x and w for y and y respectively 
occuring in the numerator. In the other factor the values of y and 
ý are substituted for x and f respectively. If this point is kept 
in mind we can easily write the formula . 


The above formula which is the most simplified contains 
only x , y , x , y2 and xy columns . This does not involve calcula 
tion of (x - 7 ), ( y - 7 ), ( x— ) , ( 147) a nd ( x - 7 ) ( y - 7 ). Let 
us adopt the new formula and calculate the correlation coefficient. 


X 


y 


ta 


ху 


5 


7 


25 


49 


35 


4 


6 


16 


36 


24 


6 


9 


36 


81 


54 


3 


5 


9 


25 


15 


2 


3 


4 


9 


6 


30 


20 


30 


90 


200 


134 


Σ και 


20 


n = 5. = 


ll 


4 . 


n 


5 


F ? = = 

4 X 4 = 16 . 


30 


y = 


Ey 
N 


= 6 


5 


v2 = 6 x 6 36 . 
( 1 ) Ex = 20. ( 2 ) Ey = 30. ( 3) 2x = 90. ( 4 ) ja = 200 . 
( 5 ) xy = 134. ( 6 ) N = 5 . 
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xy --- Ný 


== 


v (Ex? — N = ) ( y? --- Ny?) 


134 — 5 x 4 x 6 


11 


( 90-5 X 16 ) ( 200 -- 5x36 ) 


134 


120 


14 


( 90-80 ) ( 200—180 ) 


N 10 x 20 


14 


1.4 


= 


= 1 . 


1012 


1.4 


Calculate the value of p ? in the following case : 


S.No. of the 

field 


Quantity of fertilisers 
applied in the experi 
mental plot kg . 

( x ) 


Quanitity of yield in 
the experimental plot 

kg . 
( y) 


1 


0 


3 


: 


2 


5 


17 


3 


7 


22 


4 


9 


26 


5 


8 


25 


6 


6 


19 


7 


10 


32 


8 


4 


11 


9 


3 


9 


. 10 


2 


7 
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As we require the quantities of 

Ex , ya and Exy we have to compute the values of xº , 
you and xy as follows: 


X 


y 


r2 


y2 


ху 


0 


3 


0 


9 


0 


5 


17 


25 


289 


85 


7 


22 


49 


484 


154 


9 


26 


81 


676 


234 


8 


25 


64 


625 


200 


6 


19 


36 


361 


114 


10 . 


32 


100 


1024 


320 


4 


11 


16 


121 


44 


3 


: 9 


9 


81 


27 


2 


- 7 


4 


49 


14 


54 


171 


384 


3719 


1192 


N = 10 . 


Ex 


54 ; 


5.4 


11 


54 
10 


5.42 


5.4 

Х 
10 


; x2 


384 


10 : 


Ey = 171 y = 


171 
10 


17.1 


12 


17.1 
10 


Х 


17.1 
10 


3719 xy 


1192 


Exy — Nī 
w ( x2 

- N 3 ") g ( y2 - N 74 ) 
Let us substitute the values of # , 5 , 3 xạ , Eya and xy in 
the expansion . 

r = 1192 -- 10 x 5.4 x 17.1 


W ( 384 — . 10 5,4 x 5.4 ) (3719 - 10 X 17.1 X 17.1 ) 


0 : 998 


- 
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It shows that there is a high positive correlation between 
the application of fertiliser and the yield . 


Shortcut Method 


y 


/ 


0 


3 


5 


17 


7 


22 


9 


26 


25 


6 


19 


10 


32 


4 . 


11 


3 


9 


2 


7 


In the previous methods we had to find out the squares of 
each of the x and y values. Again we have to find the product 
of the values of x and y . This is very simple when x and y are 
small values. On the other hand if the values of variables are 
large, as in the case of y , squaring and multiplying will be very 
difficult. Hence we have to find out some shortcut method . When 
the values of x and y are reduced so that we can easily find out 
the squares and products without the help of mathematical tables 
calculating machines. 


In this shortcut method we shift the base and reduce the values. 
Let us take 6 in the case of x values and 19 in the case of y -values 
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as arbitrary values . Let us change the x - value into : u and y 
value into y by the following substitution . 


. 


u 


* —- 6 . 


y 


- 19 . 


Let us substitute the value of u , v , u , v2 , uv in the place of 
x , y , * °, y ?, xy respectively in the formula for r . 


N X 


r 


Σ xy - 
( 8 x - N ) ( ya 


Nja) 


Συν 


N นิ ท 


( u - Nū ) (Eye -- 

( Eye Njº) 


1 


u 


u ? 


22 


UV 


-6 


-16 


36 


256 


96 


-2 


1 


4 


2 


1 


3 


1 


9 


3 


3 


7 


9 


49 


21 


2 


6 


4 


36 


12 


0 


0 


1. 


i 


- 


: 


4 


13 


16 


169 


52 


2 


-8 


4 


64 


16 


--3 


-10 


-9 


100 


30 


-4 


-12 


16 


144 


48 


-6 


19 


96 


831 


280 
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ū 


-6 
10 


-19 


0 : 6 ; 


-1.9 


10 


Σαν 


Nū ū 


N (£ u — N ū ) (E 12 - N v2) 


280 


10 x 40.6 X 1.9 


w (96 - 10 x 0.6 ) (831 - · 10 ( 1.9) 


280 — 11.4 


268.6 


( 96 -- 3.6 ) ( 831-36.1 ) 


92 : 4 x 794.9 


268.6 


di 

( 92.4 x 794.9 


268.6 


268.6 
9.6 x 28.2 


0.992 


270.7 


It is seen that the conversion of values from x into u and 
y into v does not affect the value of the correlation ratio . 


In the above shortcut method we have changed the base 
from 0 to 6 in the case of x and from 0 to 19 in the case of y values . 
The above conversion is in the fcllowing form namely d = X - A : 
We can also have other types of conversions where we can have 
a change in the scale instead of change in the base , or we can have 
change in the base as well as change in the scale as detailed below . 
In all these cases the value of the correlation co - efficient will not 
be affected . 


( 1 ) . Change in the base : d = X - A 


X 


( 2 ) Change in the scale : d 


*10 


с 


x - A 


( 3 ) Change in the base 

and in the scale : d 


CA 
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Since we have two different values xy and it will be confusing 
if we used for both . Hence we can use u and y for x and y respec 
tively . 


X 


y 


1. Change in the base 


4 = X - A 


V = y - B 


X 


2. Change in scale 


у 
d 


C. 


3. Change in the base 

and scale 


X - A 


y - B 


u 


V = 


C 


d 


The students can try the methods 2 and 3 for any example. 


Interpretation of Karl Pearson s correlation co -efficient 


The value of Pearson s coefficient “ pul always lies between 
-1 and +1 . 


When p ? is equal to +1 , it indicates perfect positive corre 
lation , when it is equal to -1 , it indicates perfect negative corre 
lation and when it is equal to 0 , it indicates no correlation . 


Merits 


It reveals the nature of the correlation between two variables 
and at the same time it gives a numerical measure of the corre 
lation . 


Demerits 


1. Whether the correlation between two given variables 
is linear or not, we assume it to be linear when we calculate the 
Pearson s Coefficient of correlation . 

2. It involves much time to find out the correlation co 
efficient. 


Caution about the correlation coefficient 


It may be noted that correlation coefficient only expresse 
association and it does not itself tell anything about the causes 
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of the relationships of the variates. Because two variates are 
correlated , we cannot say whether the variation in one variate 
is the cause or the result of variation in the other variate . We 
cannot say that the association is due to the mutual dependence 
of two variates or due to common causes affecting both of them . 
Further a high value of correlation does not always indicate rela 
tionship of the two variates since such high values may be accidental 
also . In certain cases a high value of correlation may exist 
between two variates such as number of births in the hospital 
and yield of wheat and such correlations may be called 
spurious correlation or Nonsense correlation . 


Correlation Table 


For a single variable we have prepared a frequency table . 
Similarly for the simultaneous distribution of two variables we 
can prepare a table called correlation table . An illustration of 
correlation table showing the number of blocks distributed accor 
ding to area and population is given below . Generally, the 
correlation table will be a two way classification . 


Distribution of number of Blocks in Tamil Nadu according to area 
and population 

(Population * 000 ) 


Area in sq . 


Total 


Less than 40-60 60-80 80-100 

40 


Greater 
than100 


kilometre 


Less than 150 


.12 


4 


7 


4 


27 


150 - 250 


4 


35 


51 


24 


5 


119 


250 - 350 


1 


28 


68 


30 


5 


132 


350 - 450 


1 


7 24 


20 


5 


57 


450 - 550 


3 


7 


7 


3 


20 


1 


550 - 650 


5 


7 


2 


14 


650 - 850 


1 


2 


- 


3 


Greater than 850 


- 


- 


2 


2 


6 


85 


160 


97 


26 


374 
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In a two -way table particulars of two variables will be recorded . 
If one of the variables or both the variables in a two - way table 
are qualitative, the table will be known as a contingency table . 
When both the variables are quantitative then the two - way 
table will be called correlation table . 


Construction of a correlation table 


From the raw data we can also prepare a correlation table . 
Let us prepare a correlation table for the following data . 


Height 
( cms. ) 


Weight 
(kg .) 


156 
128 
145 
120 
110 
112 
115 
120 
140 
125 
111 
105 
112 
115 
120 
125 
117 
115 
112 
119 
120 
125 
130 
145 
150 


60 
62 
75 
45 
40 
49 
52 
38 
75 
59 
65 
42 
50 
60 
63 
68 
74 
75 
65 
69 
70 
55 
58 
60 
75 
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Let us find out the maximum and minimum values of x and y. 


X 


y 


Maximum value 


156 


75 


Minimum value 


105 


38 


51 


37 


In the case of x we can have 10 as the class interval and 
in the case of y we can have 5 as the class interval. The various 
classes can be arranged as follows: 


X / Y 


35-40 


40-45 


45-50 


50-55 


55-60 


60-65 


65-70 


70-75 


75-80 


Total 


100-110 


1 


1 


110-120 


1 11 2 


1 3 1 1 10 


120-130 


1 


1 


2 


2 1 1 


8 


130-140 


1 


1 


140-150 


1 


2 3 


150-160 


1 


1 2 


Total 


1 2 2 2 3 5 4 2 4 


25 


Since there are 25 items, the grand total of the columns and 
the rows is equal to 25 . 


After the construction of a correlation table of the above 
type we have to classify the data . For this purpose we should 

S. II - 14 
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consider simultaneously both x and y values of each item . Let 
us take first the x value ie , 156 and it occupies the last class namely 
150–160 . But there are 9 columns under y . We have to consider 
the y - values. The Y - value is 60. It occpies col . 60-65 Hence we have 
to put a tally mark ( / ) in the square against 150-160 under x and 
60-65 under y . Similarly we have to classify the data for all the 
items. 


After transferring all the items by means of tally marks as 
in the case of classification of data , we have to count the number 
of tally marks in each square or block . The number of tally 
mạrks in each square will represent the frequency. The number 
of tally marks in each square is the frequency and they are noted 
in the square of the model table . Finally, a correlation table 
of the above type will emerge out from these data. 


EXERCISE 


1. Find out the correlation coefficient: 


X 


: 


78 
89 
97 
69 
59 
79 


125 
137 
156 
112 
107 
136 


2. Find out the correlation coefficient: 


X 


y 


for 


. 


50 
70 
30 
10 
90 
120 
· 80 
30 


80 
90 
50 
40 
190 
30 
70 
90 
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3. Calculate the correlation coefficient: 


( i ) 


X 


у 


55 


. 


35 


65 


45 


75 


55 


35 


25 


45 


45 


( ii ) 


28 


23 


41 


35 


40 


33 


38 


34 


35 


31 


( 4) Calculater for the following data . 


( i) * 


y 


( ii) x 


y 


5 


8 


10 


12 


6 


9 


8 


10 


4 


. 


6 


, 


9 


15 


7 


10 


20 


25 


8 


12 


13 


8 


CHAPTER IV 


REGRESSION 


Regression is allied to correlation . It means the tendency 
to retain . In statistics, regression means the average relationship 
between the variables. The correlation gives the degree of rela 
tionship between the two variables and it is independent of the 
unit in which the original values are expressed . The square of 
the correlation ratio ( ?) gives the relative amount of variation 
in the dependent variable . But Regression describes the functional 
relationship between the two variables. 


Purpose of Regression Analysis 


It is clear that the value of one variable ( generally dependent 
variable ) can be estimated from the value of the other variable 
(independent) with the help of the functional relationship . After 
estimating the value of one variable with the help of the functional 
relationship.we can also find out the deviation between the 
observed value and the estimated value and this can be otherwise 
called the error of our estimate . Generally this is similar to the 
standard deviation . Hence the error of estimate is called the stan 
dard error of estimate . Therefore , three questions are involved 
in this study . 


1. To find out the degree of relationship between the two 

variables called Correlation ( r ). 


--- 


2. To find out the functional relationship between the two 

variables called Regression . 


3. To find out the difference between the observed value 

and the value computed with the help of the functional 
relationship and express it as a measure of standard error 
of estimates Sya 
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Regression can be expressed graphically or algebraically . 
Graphic representation of regression is called Regression line . 
The Algebraic representation is known as Regression Equation . 


Regression Line 

The average relationship between two variables is described 
by the regression lines . When the exact value of one variable 
is given , the most probable value of the other variable is shown 
by the regression line. 


When there are two relative variables, there will be two 
regression lines, one for each variable with reference to the other . 
Suppose there are two variables x and y, there will be one regres 
sion line for x with reference to y . This line is known as the 
Regression line of x on y . Another line for y with reference to 
* will give the value of y for the corresponding value of x . This 
is known as Regression line of y on x . 


Properties of Regression Lines 

( 1 ) If the correlation between the two given variables is 
perfectly positive, i.e. when the correlation coefficient r is equal 
to + 1 , the two regression lines will coincide with each other . It 
means that there will be only one line instead of two lines. In 
such situation the regression line will be as follows: 


Y 


4-5 ° 


0 


Х 


FIG . 31 
Perfect Positive Régression 
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( 2 ) On the other hand , if the correlation between two vari 
ables is perfectly negative, i.e : when the correlation co - efficient 
Spage is equal to -1 , the two regression lines will coincide and the 
line will be as follows: 


. 


i 


0 


Х 


FIG , 32 
Perfect Negative Regression 


! 


( 3 ) When there is no correlation between the two variables, 
i.e. when the correlation coefficient y ? is equal to 0 , the two 


XON 
Y 


Y ON IC 


(ay ) 


FIG . 33 
" No Correlation 
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regression lines will intersect each other at right angles. They 
will intersect each other at the point ( it , ý ) . The lines will 
be as in figure 33 . 


Regression Equation 

Regression lines are generally given in algebraical expression 
known as Regression Equation . Regression equation will help 
us , to draw the Regression line . It will also give numerical method 
of finding out the best estimate of the value of one variable from 
the given value of the other variable. 


When there are two variables x , y , then there will be two 
regression equations as follows: 


1. Regression equation of x on y . 
2. Regression equation of y on x . 


The two regression equations will be as follows: 
( 1 ) Regression equation of x on y . 


r. Ox 


( x - 2 ) 


( y - y) 


dy 


( 2 ) Regression equation of y on x . 


r . Oy 


( y - 3 ) 


( x - * ) 


In these two equations 


Ã 


Arithmetic Mean of x . 


ӯ 


Arithmetic Mean of y . 


OX 


the standard deviation of x . 


oy 


the standard deviation of y . 


. 


the correlation coefficient. 


If the value of y is given , we can use the equation ( 1) to find 
out the value of x corresponding to the value of y . If the value 
of x is given we can use the equation ( 2 ) to find out the value of 
y corresponding to the value of x . 
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Let us calculate the Regression Equation for the following 
data : 


: 0 5 7 9 8 6 10 


4 3 2 


y 


: 3 17 22 26 


25 


1932 


11 9 7 


As we have to find out the value of the correlation coefficient 
* r , we have to compute the value of x , ya , and xy for each of 
the items and construct the columns for these values as we have 
done earlier when we studied about correlation ratio . The 
final table will be as follows : 


x² 


y 


2 


ya 


ху 


0 


3 


0 


9 


0 


5 


17 


25 


289 


85 


7 


22 


49 


484 


154 


9 


26 


81 


676 


234 


8 


25 


64 


625 


200 


6 


19 


36 


361 


114 


10 


32 


100 


1024 


320 


4 


11 


16 


121 


44 


3 


9 


9 


81 


27 


2 


7 


4 


49 


14 


54 


171 


384 


3719 


1192 


1 
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Σ και 


}} 


54 . 


À 


54 
10 


5.4 


Σ x2 


2 


N X 


V 


N 


= 


384 
10 


5,4 x 5.4 


384 


10 X 5.4 x 5.4 

10 


V 


384 291.6 

10 


V 


92.4 
10 


79.24 


3.04 


Oy - 


- ja 


N 


Oy = 


V 
Vi 
V 


3719 
10 


17.1 x 17.1 


31 


3719 - 2924.1 

10 


794.9 
10 


779.49 = 


8.9 - 


Σ xy - Ν ΞΥ 

Oy 


N OX 


1192 — 10 X 5.4 x 17.1 

10 x 3.04 x 8,91 


= 0.998 . 


We have calculated the following values : 
5.4 ; ox 

3.04 ; 7 

= 17.1 oy = 8.91 


r 


0.992 


The Regression Equation of X on Y is 


OX 


(x ~ ) = r. 


( y -- 7 ) 


gy 


( x -- 5.4 ) = 0.992 x 


3.04 

( y – 17.1 ) 
8.94 
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x – 5.4 0.34 ( y — 17.1 ) 
* — 5.4 = 0.34y . – 0.34 x 17.1 
* — 5.4 

0.34y – 5.81 
0.34y – 5.81 + 5.40 
0.34y 0.41 
0.34y – 0.4 


X 


X = 


The Regression of y on x. 


Oy 


( -y) 


(x – 


# ) 


OX 


( y - 


17.1) = 0.992 x 


8.94 
3.04 . 


x ( x - 5.4 ) 


= 


( y — 17.1 ) 
y - 17.1 


2.93 ( x – 5.4 ) 
2.93x 15.8 


y = 2.93x 


15.8 + 17.1 


y = 2.93x + 1.3 


It is now clear that we can form the Regression Equation 
of yon x and the Regression equation of x on y provided we are 
given the following values : 


8 , ý dĩ , gy , 


r . 


Let us write down the equation from the following data : 
F = 5 ; 

5 ; j = 7 ; 0x = 1 ; oy = 2 ; r 0.7 


( 1 ) Regression Equation of x on y is : 


= 


( x – 7 ) = r . (y- y ) 

oy 
(x - 5) = 0.7 x 1 ( y 7 ) 
( x – 5) 0.35 ( y 

( y – 7 ) 
5 = 0.35y 2.45 

0 35y 2.45 + 5 
0.35y +2.55 


: 


X 


X 
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(2 ) Regression of equation of y on x : 


( y – y ) 


Oy 


Er . 


( x - 3 ) 


OX 


( y - 7 ) 


= 0.7 x + ( x - 5 ) 

1.4 ( x - 5 ) 
1.4x - 7 


y - 7 


1.4x - 7 


y 


1.4x . 


Regression Coefficient 


When we studied about the corrélation , we have come across 
the coefficient of correlation or the correlation coefficient which 
indicates the intensity of relationship of the two variables. Now 
when we consider the Regression we will have another coefficient 
called Regression coefficient. 


While the coefficient of correlation indicates the intensity 
of the relationship , the Regression coefficient indicates the func 
tional relationship of the two variables. The functional relation 
ship is more important than the intensity of the relationship since 
the functional relationship helps us to calculate the value of one 
variable from the other . Hence the Regression coefficient plays 
greater part in the study of economic problems to estimate or 
forecast furure values of one item corresponding to the values of 
another item . Hence the students should clearly understand 
the subtle difference between the intensity of the relationship 
and the functional relationship . 


We can quote one illustration from human relationship , 
even though it may not fully explain the difference . 


When we say that A is a relative of B , we know that they 
are related . But we do not know to what extent the intensity of 
( closeness) their relationship exists. On the other hand , if we say that 
A is cousin of B , we understand that their relation is very close 
and not distant. This corresponds to the correlation coefficient. 
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We can further analyse the statement. Instead of saying 
that A is the cousin of B , we can say that A is the son of B s 
father s brother. This indicates their functional relationship . 
Perhaps this may correspond to the Regression coefficient. From 
this relationship we can give also the relationship of A s son and 
B s son . 


We have considered two equations 


( x - 2 ) 


r . 


(y – 5 ) ..... 


. ( 1 ) 


ау 


Oy 


( y - y ) = r . 


- 


(x - 5 ) ..... 


( 2 ) 


The first equation is the Regression equation of x on y and 
the second is the Regression equation of y on x . 


Let us consider the factor 


0 % 


is the factor in the first equation . This is called 

oy 
the Regression coefficient of x on y . Similarly the factor 

ay in the second equation is the regression coefficient of y 

OX 
on x . 


Regression coefficient of x on y : 


r . 


o y 


OX 


When there is one unit measurement change in the value 
of y , the value of x will be changed by the amount equal to 

In other words it indicates the amount of change in 
Oy 
the value of x corresponding to one unit measurement change 
in the value of y . 


r . 


Regression coefficient of Y on x 


Er. 


о у 


: 


When there is change of one unit measurement in the value of 
the.x , value of y will be changed by the amount equal to r . = 


Oy 


ox 
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In other words r . 


ау 

will indicate the change in the value of 


OX 


y corresponding to one unit measurement change in the value 


of x . 


We know that 


r 


(x - ) (y - ) 

Ox Oy 


OX 


Regression co - efficient ( f x on y . 
Ox (x — 7 ) (y - 7 ) 

x 
ox . Oy 
> ( x - 8 ) ( y -ỹ) 

oya 


Oy 


Similarly the Regression co - efficient of y on x : 


oy 


( x --- * ) ( y ---- y ) 


r . 


Ox 


σχ2 


Computation of correlation ratio ( r ) from Regression Coefficient 


While we have two regression coefficients ie : ( 1 ) Regression 
coefficient of x on y and ( 2 ) regression coefficient of y on X , 
we have only one correlation ratio or correlation coefficient ( r ). 
Let us distinguish the two Regression co -efficients by the letters 
m , and m . 


Let m , represent the regression coefficient of x on y . 


r . OX 


Oy 


Let m , represent the regression coefficient of y on x . 


r . Oy 


m 


OX 
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Let us now find out the value of their product. 


r . 


OX 


Oy 


m , X mg 


X r . 


r2 


oy 


OX 


+ r . 


: . m , xm , 

mm , 


#r . 


It is now clear that the correlation co - efficient is nothing but the 
square root of the product of the two regression co -efficients. Hence 
r will have two values, one positive and another negative. If 
both the regression co -efficients are positive, take the positive root 
of r . When both regression co - efficients are negative, take the 
negative root of r . 


Calculate the correlation ratio from the following details: 
Regression co - efficient of x on y = 0.9 
Regression co -efficient of y on x = 0.4 
The product of the two regression 

co - efficients = 0.9 x 0.4 


0.36 


= r2 


Correlation ratio r = 10.36 


0.6 


From the above result, it is clear that we can calculate the 
average of the regression co - efficients provided we are given the 
value of y and the regression co -efficient of the variable. 


Calculate the regression coefficient of y on x from the 
following data . 


r = 0.6 


Regression co -efficient of one variable on the second value 
0.9 


Let the regression coefficient of the second variable on the 
first variable be equal to m . 
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m x 0.9 


0.6 x 0.6 


= 0.36 


m 


0.36 ; 0.9 


m 


= 0.4 


: 


Regression co - efficient and Ratio of the Standard Deviation 


In the previous case , we have multiplied the two regression 
co - efficients and proved that their product is equal to the square 
of the correlation ratio . r2. Now let us consider the ratio of the 
two regression co - efficients : 


Regression co - efficient of x on y : 
Regression co - efficient of yon x 


( 1 ) 


oy 
oy 


r. 


OX 


r . o X 


Ratio 1 


11 


x 


o y 


r.oy 


ox ? 
o y2 


Regression equations and straight lines 


We have studied in the previous chapter the construction of 
Regression Equations with the help of correlation co -efficient 
and the standard deviation of the two variables. The regression 
equations of 


j 


OX 


x on y = ( x - 7 ) 


r . 


( y - y ) 


0. Y 


Oy 


yon x 


( y- y ) = r . 


( x - 2 ) 


0 x 


Afterwards we have considered the computation of the two 
regression co - efficients from the value of “ r and the standard devi 
ation of the two variables. 
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Regression co - efficient of x on y = r . 


oy 


су 


Regression co - efficient of yon x = r . 


ох 


Generally it is usual to construct one Regression equation 
instead of two equations and one regression co - efficient instead 
of two regression co - efficients. The general convention is to 
construct the Regression co - efficient of yon X. This is 
due to the fact that the x values will be treated as an independant 
value and the y - value will be treated as a dependant value. 


one . 


The methods we have considered earlier for the construction 
of Regression Equation may appear rather a more round about 

There is still another method in which the regression co 
efficient will be first computed and afterwards the regression equa 
tion be formed . In this method , the important assumption made 
is that the relationship between the two variables is linear and 
consequently the Regression Equation will represent the equa 
tion to a straight line. Before we proceed further, let us study 
the equation to a straight line as we do in the co -ordinate 
geometry. 


y = mx + c is the general equation to a straight line where 
m represents the slope of the angle made by the straight line with 
the X - axis, and represents the interception made by the 
straight line on the Y -axis from the origin . 


P ( x , y ) 


cy - c ) 


y 


B 


fc 


A 


FIG . 34 
Straight Line and Axes 
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After the assumption that the functional relationship between 
the two variables is linear i.e. in the form of an equation to a 
straight line , we should construct the exact equation giving the 
exact values for the two unknowns namely m and c in the equation 
y = mx + c . Once these two values are estimated the exact 
equation can be determined . We shall see how these two values 
are computed . 


Since there are two unknowns namely m and c , we want 
two equations, preferably simultaneously, both involving 
m and c . Let us construct the two equations first. 


Let y 


= mx + c . be the first equatic n ( 1 ) 


Let us multiply throughout by x . 

We get , x y = mx + cx and let this be the second equation ( 2) 
у mx + c 

.. ( 1 ) 
xy = mx + cx 

.. (2 ) 


t 


Let us now consider the first equation 


y = mx + c . 


There are many items involving x and y values and let us 
have five values in our example ; 


X : Xq , xg , xg , * p , Xgo 


y : Yu , Ygs Y 3, Ye Ys . 


After the formation of the Regression equation y = mx + c 
we can compute the y -value for each of the corresponding x values 
namely xq , xg , xg , X. , Xo values for x in the original equation 
y = mx + c . For every x value , we have two y values, namely 
one value “ y as given in the problem and another y - value 
as computed by us. Let us differentiate these two y - values as 

S. 11-15 
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% and yo, where y . means observed value of y and y . means the 
computed value of y . The final position will be as follows : 
:: 


Xi 


i 


yo 


Computed value 

( 3) 


( 2 ) 


X 


yı 


Yc1 


= 


mx , + c 


X , 


y 


Yog 


mx , TC . 


i 


ys 


y : 


Ycg 


тхв 


tc 


+ 


x 


ya 


You 


mx , + c 


X5 


y 


ŷo 


уск 


mx 5 


tc 


15 : 


isti 


Let us add columns 2 and 3 . 


y , + y + y ; + y + yg = mx , + mx , + mx , + mx , + mx + 

C + C + C + C + C 


: 


= m ( x , + * , * : + xo + xg) + 5Cs 
y = m Ex + 5C .... 

.... (1) 


Let us now multiply each of the y values and computed values 
by the corresponding x values by xq , xq , xg , x , and to 
respectively . The equation will be as follows: 


2 


X , Y , = mx, + Cx , 


I 


X , Y , = mx 


+ Cx , 


* 8 Ya, még ? * Cxg 


2 


x y = тх 


+ Cx 


.. 


2 


X Y = mx , + Cx , 
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Adding 


x , y , + x2y + xy + xy + xy = 4 xy 
mx, + mx ? + mx ! + mx, + mxg 


2 


+ Cx , + Cx , + Cxz + .Cx , + Cxg 


m ( x ,? + x2 + x2 + x2 + * ;? ) + 

+ C ( x , + xy + xg + x , + x , ) 
= m Σ x2 + C Σ και ..... (2) 


Σ xy : 


Now we are having two equations as follows: 


Σy 


m Ex + 5c .... 


( 1 ) 


Σ xy == m Σ x2 + C Σ x 


. (2 ) 


The number 5 appearing in first equation represents the 
number of items . Generally the number of items will be represen , 
ted by N. Hence we can rewrite the equation as follows. 


Σ και 


= m 2 x + Nc .... 


...(1) 


Σxy = m Σ + CΣ x 

m & x2 + C £ x .......... (2) 


These two equations which help us to find out the values 
of m and c are called Normal Equations. 


From the values given , we can easily have Ex , and Ey by 
adding their values. What we require further are Exa and Exy . 


Of these , £ x * can be obtained by squaring each of the x values 
and adding them . Similarly Exy can be obtained by multiplying 
each of the x values by the corresponding y - values and summing 
up the values. We can make the position clear with the help 
of an example. 
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S.No. 


X 


y 


x2 


ху 


1 


0 


3 


0 


0 


me 


2 


5 


17 


25 


85 


3 


7 


22 


49 


154 


4 


9 


26 


81 


234 


5 


8 


25 


64 


200 


6 


6 


19 


36 


114 


7 


10 


32 


100 


320 


8 


4 


11 


16 


44 


9 


-3 


9 


9 


27 


10 


2 


7 


4 


14 


Total 


54 · 


171 


384 


1192 


Σ » 


Σy 


Σx2 


Σxy 


1 


Let us substitute these values in the above 2 normal equations. 


54 m + 10 C 


171 


( 1 ) 


384 m + 54 C 


1192 


( 2 ) 


We can solve these two equations by means of simultaneous 
equations and get the values of m and c . In this process the co 
efficient of any one of the variables namely m or c in both the 
equations can be made the same. Let us make the co - efficient of C 
in both the equations same. For this purpose we can multiply the 
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first equation throughout by 54 and the second equation throughout 
by 10. These two equations will be changed as follows : 
54 x 54 m + 54 X 10 C 

171 x 54 . ( 1 ) 
384 x 10 m + 54 x 10 C 

1192 x 10 ( 2 ) 


In these two equations the quantity containing C are the 
same in both equations and having the same sign . Hence we can 
subtract equation ( 1 ) from ( 2) In this, the term containing C will 
vanish . 


54 x 54 m 171 Xx 54 ( 1 ) 

) 
384 x 10 m = 1192 x 10 ( 2 ) 


2916 m 


9234 


( 1 ) 
( 2 ) 


3840 m 


11920 
2686 


2-1 / 924 m 


m 


2686 
924 


2.9074 . 


5 


We can substitute this value for m in any one of the equations. 
Let us substitute the value in equation . 


54 m + 10 C 


= 171 


54 x 2.9074 + 10C 


171 


; 


156.9996 + 10C 


171 


10C 


171 — 156.9996 


= 14.0004 


14 


с 


= 


1.4 


10 


The required equation is y = 2.9074 x + 1.4 


We can summarise as follows: 


( 1 ) The line of best fit is called the Regression Line . 
( 2 ) The equation to the Regression Line is called Regression 

Equation. 
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( 3) The constant m , the coefficient of x is called Regression 

co - efficient 


( 4 ) The estimated value of y on the basis of the Regression 

Equation corresponding to a particular value of x is 

called Regression of y on x . 
( 5 ) . The value of " C " is the value of y when x is equal to 0 . 
(6 ) The two equations used for computing the values of m 

and c are called the Normal Equations. 


6). T 


ir i 


Computation of the dependent value with the help of the Regression 

Equation 

After deciding about the functional equation we can estimate 
the value of y for each of the given values of x . In our problem 
the Regression Equation is 


y = 2.9074 x + 1.4 


Let us consider this equation and find out the value of y for 
each of the values of x . For this purpose we have to substitute 
the value of x in the above equation . The values obtained are 
given below . Let the computed value be represented as Yc 


X 


Yc 


2.9074 x 1.4 


.0 


5 


Yc = 2.9074 x 0. + 1.4 1.40 

=- 2.9074 x..5 + 1.4 = 15.94 

2.9074 x 7 + 1,4 = 21.75 


9 


= 2.9074 x 9 +1.4 


27.57 


8 


- 2.9074 x 8 + 1.4 


24.66 


6 


2.9074 x 6 + 1.4 


18.85 


10 


2.9074 x 10 ti 1.4 = 30.47 : , 


4 


2.9074 X : 4 + 1.4 


13.03 


3 


2.9074 x 3.6.1.4 


10.12 


2 


2 : 9074 X 2+ 1.4 


7.21 


171.00 
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We have two sets of y value for each of the x values namely 
the observed value yo, and the computed value yo We can test 
the validity of our equation indirectly by calculating the difference 
between the two sets of value for y . 


yo 


yc 


d = y . --- Yc 


d2 


( y.— y.ja 


1 


3 


1.40 


1.60 


2.5600 


17 


15.94 


1.06 


1.1236 


22 


21.75 


0.25 


0.0625 


26 


27.57 


-1.57 


2.4649 


( 


itti 


25 


24.66 


0.34 


0.1156 


1 


4 


19 


18.85 


0.15 


0.0225 


32 


30.47 


1.53 


2.3409 


11 


13.03 


-2.03 


4.1209 


1 


9 


10.12 


-1.12 


1.2544 


7 


7.21 


-0.21 


1 : 50 : 0441 


i 


171 


171.00 


0.00 


14.1094 


We find that the total difference is 0.00 The total deviation 
is 0 because of the positive and negative deviations. This can 
be overcome by squaring the difference when all of them will 
become positive. 


ii . 


232 


2 


yo 


yo 


Yc 


ye? 


( y . -- y .) 


3 


9 


1.40 


1.9600 


2.5600 


17 


289 


15.94 


254.0836 


1 : 1236 


22 


484 


21.75 


473.0625 


0.0625 


26 


676 


27.57 


760.1049 


2.4649 


25 


625 


24.66 


608.1156 


0.1156 


19 


361 


18.85 


355.3225 


0.0225 


32 


1024 


30.47 


928.4209 


2.3409 


11 


121 


13.03 


169.7809 


4.1209 


9 . 


81 


10.12 


102.4144 


1.2544 


7 


49 


7.21 


51.9841 


0.0441 


171 


3719 


171.00 


3705.2494 14.1094 


We find that sum of the squares of the observed values of 
y is equal to the total of the sum of the squares of the computed 
values of y and the sum of the squares of the difference of the 
observed value and computed value. 

Σ ».3. Ey ? + Eye - y .) 
3719 - 3705. 2494 + 14.1094 

= 3719. 35 or 
= 3719 


The actual difference is only 0.35 which can be ignored for 
all practical purposes. Hence the difference can be taken as 0 . 
Therefore the equation can also be rewritten as follows: 

Ily . - ya) Sy 
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Standard Error of Estimate 


Yo - Yc is the deviation noticed between the observed value 
and estimated value of y . 


( y - y ) 


the square of the deviation between the 
two values . 


( y - y . ) 


= the sum of the squares of the deviations bet 

ween the observed value and the estimated 
value of all the items. 


( y - y .) ? 

N 


Average sum of or Mean squares of the 
deviation between the observed value and the 
estimated value of y. 


1 


observed 


( y . - y ) 
1 

N 


Average deviation between the 
value and the estimated value. 


This can by writtin as follows ; 


2d 


Vid ) 


VE 


Σ 42 
N 


- 


d2 


since d = 0. ( almost ) 


N 


I 


This is rather an average of the error noticed in the estimate . 
Hence it is called standard error of estimate . In short , this can 
be called the standard deviation of the deviation or differenc . 


od 


Σ d ? 
N 


d2 


V 


Σd?? 
N 


since d = 0 


Importance of Regression Analysis 


1. One of the important uses of regression is prediction 
or forecast . Hence greater importance is given to regression than 
correlation . It has great use in Economics. 


2. Regression is used to estimate the study of supply and 
demand according to change in prices . It is also useful in esti 
mating the likely increase in consumption and savings correspond 
ing to increase in income. 
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3. It is used in the study of savings and investment. 


4. Estimation of yield of crops due to the change in weather 
condition is being made with the help of regression. 

sis 5. Estimatiow of changes in public income due to changes 
in rates of taxation , estimation of changes in bank deposits and 
changes in bank loans due to changes in the rate of interest are 
being made by regression analysis . 


Difference between correlation and Regression 

: . : 1 . Correlation gives the nature and degree of relation ship bet 
ween two variables. But Regression gives the average change in the 
value of one variable corresponding to the change in the value 
of the other variable , Regression gives the exact or the functional 
relationship between the two. 


2. The correlation between x and y is same as the correlation 
between y and x . But the Regression of x on y is not the same 
as the regression of y on x . 


::: . 


Exercise 


(1) . Fit a straight line for the following data. 


X 


y 


X 


5 


8 


10 


12 


6 


9 


gelen . 13.10.11 


4 


6 . 


15 


1 


r . 


ge 


7 


10 


20 


25 


8 


12 


13 


8 
i s 


. 


:: 
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(2) The Regression lines of Yon X and Xon Y are given below . 

Find the correlation co - efficient between X & Y. Find 
also the ratio of ox : Oy. 


( a ) 


Y = 0.80x + 25 


X = 0.45y + 30 


( 
b 
) 


X + 2Y = 5 . 


1 


2x + 3Y = 8 . 


( 3 ) From the given values find the 2 regression equations. 


X 


Y 


70 


120 
130 . 


80: 


! 


90 


150 


60 


110 


50 


100 


70 


130 


60 


120 


60 


100 


( 
4 
) 


The height of fathers and son are given below . Find 
out the regression coefficient and estimate the height 
of the son when the height of the father is 164 cm . 


Father s Ht. 


Son s Ht, 


160 


180 


165 


160 


162 


170 
180 


158 


168 


160 
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( 5 ) 


Find out the regression equations and obtain the best 
estimate of X when Y = 7 and the best estimate of Y 
when 


X = 5 . 


X = 10 , 


Y = 20 


Ox = 1.5 


Oy = 2 


= 0.7 


(6 ) 


Given the Mean of X and Y as 65 & 67 . 


Their standard deviation are 2.5 & 3.5 . 


The co - efficient of correlation is 0.8 


( i) Find out the 2 regression equations 


(ii) Find out the best estimate of 


X when Y = 70 . 


CHAPTER V 


RANK CORRELATION 


The relationship between two variables can be studied in 
the following two ways . 


1. We can study the relationship between the actual values 
of the two variables. 


2. We can study the relationship between the ranks of the 
values of the two variables. 


1 


Ranking 


An ordered arrangement of objects is called Ranking. Con 
sider a set of individuals who are arranged in order according 
to some quality . The ranked data may arise from materials which 
are believed to be capable of measurements theoretically but at 
the same time they cannot be measured in practice. Intelligence , 
complexion etc., may come under this category. 


Let us consider the marks obtained by 10 students in two 
subjects namely in language and science . Lef us also give ranks 
to them for each subject based on the marks given below . 


a 


If a student is uniformly good in both the subjects, he woul - 
get the same marks in both the subjects. If the same ideal situs 
ation is applicable to each and every student, each of them wil 
get the same marks in both the subjects and in other words each 
one would get same rank in both the subjects. Sometimes the 
marks obtained in both the papers may not be the same but differ 
from each other . But the ranks obtained by them in each subject 
may be the same. In such situations, the difference between the 
two ranks obtained by a single student in both the subjects will 
be 0. In the ideal situation explained above , the difference in 
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the ranks will be 0 in the case of each and every student and hence 
the total of the differences will also be 0 . 


Marks 

Marks 
obtained Rank obtai Rank 
in language ined in 

Science 
S.No. 

( R ) 

( R ) 
( 1 ) ( 2) ( 3 ) ( 4) ( 5) 


Difference 

in the 
ranks 
( R. - R ) 

( d ) 


Square of 
difference 
of the 
ranks . 


d ? 


: 


( 6 ) 


( 7) 


68 


2 


65 


4 


-2 


4 


1 
2 
. 
3 
. 


70 


1 


60 


5 


-4 


16 


65 


3 


75 


1 


2 


4 


4 . 


55 


5 


50 


7 . 


-2 


4 


5 . 


50 


6 


70 


3 


3 - 


9 


6 . 


60 


4 


: 


73 


2 


2 


4 


7 . 


48 


7 


55 


6 


1 


1 


8 . 


44 


9 


48 


8 


1 


1 


9 . 


40 


10 


45 


9 


1 


1 


10 . 


45 


8 


42 


10 


- 2 


4 


TOTAL 


55 


55 


0 


48 


But such an ideal situation may not exist. Naturally , there 
will be difference between the ranks obtained by a student. Even 
if the ranks obtained by each student in each subject is different, 
or even if there is difference in the ranks in the case of each aŋd 
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every student , the total difference will be 0. Because of this pecu 
liar situation a suitable formula has been designed for determining 
the rank correlation. The following formula is adopted where 

6 Ed ? 

6 d2 
E 1 

- 1 
n (n - 1) n ( n + 1 ) ( n - 1 ) 


d represents the difference between the two ranks and n represents 
the number of students . 


In the problem given 


6 X 48 


r 


1 


6 X 48 
10 X ( 100-1) 


= 1 


- 


10 x 99 


288 


1 


702 
990 


+0.71 


990 


The same illustration can be interpreted as the marks given by 
two examiners to the students on the same subjects. 


Interpretation of Rank Correlation Co - efficient 


1. Rank correlation co - efficient varies between - 1 and + 1 


2. When R is equal to + 1 , there is complete agreement in 

the order of ranks in the case of both the variables. 


d 


d2 


Rank in 
first 


Rank in 
second 


1 


1 


0 


0 


2 


2 


0 


0 


3 


3 


0 


0 


4 


4 


0 


0 


5 


5 


0 . 


0 


7 


240 


R = 1 


6 d2 
n (na 1 ) 


6 x 0 


= 1 


-1 – 5x (25–1) 


1-0 


= 1. 


3. When R is equal to - 1 , there would be.complete agreement 
in the reverse order of rank . But the ranks are in the opposite 
direction . 


Rank 


Rank 


d 


d2 


1 


1 


5 


-4 


16 


2 


4 


-2 


-2 


4 


3 


3 


0 


0 


4 


2 


+2 


4 


5 


1 


+4 


16 


40 


R = 1 - 


6 Ed ? 
in (nº — 1 ) 


: 


1 - 


6 x 40 
5 x 24 


1-2 = -1 . 


4. When R is equal to 0, there is no agreement at all in the 
order of ranks. 
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Exercise 


( 1 ) Calculate the Rank Correlation for the following data . 


(i) Rank assigned 
A 

B 


( ii) Marks assigned 
A 

B 


1 


4 


25 


35 


3 


7 


55 


65 


4 


5 


50 


40 


5 


6 


35 


25 


6 


3 


45 


60 


7 . 


2 


59 


62 


2 


1 


1 


62 


76 


28 


42 


S. II – 16 


CHAPTER VI 


INDEX NUMBERS 


Index number is a device for estimating the relative move 
ment of values of statistical variables, in case where measurement 
of its actual movement is inconvenient or impossible. The measure 
of index number is appropriate when the variable in question is 
not stable in its composition . The basis of the method of index 
is relative. 


The measures of central tendency give an average value 
of a group of figures, which gives an average of different items 
which are expressed in different units of measurements also . For 
example, the price of grain is expressed in terms of kilo 
gram , the price of oil etc. in litres and the price of cloth in terms 
of metre . Still, we can give an average price of these items which 
are in different units of measurement, by means of index numbers . 


r 


Classification of Index Numbers 


In the study of economics , various types of index numbers 
are used . Some of the important index numbers are as follows:. 


1. Price Index Number . 


2. Quantity Index Number . 


3. Value Index Number. 


4. Index Numbers of Special Purposes. 


Generally , we come across with the price index numbers and 
quantity index numbers . Therefore, let us study in detai] 
the price index number, which is more commonly used in all 
spheres of economic activities. 
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Construction of Price Index Numbers 


Suppose we are given the price of rice prevailing in two different 
periods of time, we can calculate the price index . 


The price of rice in 1971 


- Rs.2 per kg . 


The price of rice in 1978 


- 


Rs.3 per kg . 


In such circumstances, the price prevailed in one period can 
be taken as the base for comparison and the price prevailed in 
the other period can be expressed as a ratio of the price in the 
base period. In our example, the price of rice in 1971 can be 
taken as the base year price while the price in 1978 may be taken 
as the price prevailed in the current year . 


Rs . 3 


. 
|| 


The index number of 
price of rice in 1978 


Price per unit measure 
ment of rice in 1978 
Price per unit measure 
ment of rice in 1971 


= 1.5 
Rs . 2 


The price prevailed in the current year will be divided by the 
price prevailed in the base year and expressed in a ratio . In 
the above example , the ratio is 1.5 . This means that the price 
of rice has increased by half of the price of the base year . In 
other words, the price has increased by 50 % Generally , the ratio 
will be converted into percentage by multiplying it by 100 so as 
to facilitate easy comprehension , comparison and also compu 
tation . 


The formula for the price index is 


Price Index 


Current year s price 

Base year s price 


x 100 


So in all our studies , the base year index is always taken 


as 100 . 
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Example : Calculate the price index for the following: 


Year 


Price Index 


Price per unit of measurement 

Rs . 


1 


1970 


4 


1 


4 
4 


x 100 


= 


100 


5 


1971 


5 


li 


x 100 


= 


125 


4 


1972 


6 = 


6 
4 


x 100 


150 


7 . 


1973 


, 7 = 


X 100 


= 


175 


4 


8 


1974 


8 


200 


x 100 


11 


4 


1975 


10 


10 
4 


X 100 


11 


250 


Quantity Index 


us 


We have considered the price index . Let 

now 
consider the Quantity Index . In the case of quantity index we 
will consider the quantity of the item in the base year and in the 
current year . 


Quantity Index 


Current Year s Quantity 

Base Year s Quantity 


x 100 


Construct the quantity index in the following case taking 
1972 quantity as the base . 
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Year 


Quantity in quintals 


Quantity Index 


12 


1972 


12 


x 100 = 


100 


12 


1973 


14 


14 
12 


x 100 = 


116.7 


15 


1974 


15 


x 100 


125.0 


12 


18 


1975 


18 


X 100 


150.0 


12 


1976 


20 


20 
12 


x 100 = 


16.7 


25 


1977 


25 


x 100 = 


208.3 


12 


In our example we have considered only one item in the cons 
truction of Index Numbers. But in our day - to -day life , we come 
across with many items and naturally our index number should 
represent all the items. Therefore , we have to think of other methods 
which take into account of more than one item . There are different 
methcds and we should examine each one separately. 


Simple Aggregative Method 


Suppose we are having five commodities and the price per 
unit of these commodities in different periods are given as follows, 
we can calculate the index number of the prices based on 
the Simple Aggregate Method . 
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Price per quintal (Rs .) 


Grain 


1960 


1965 


1970 


Rice 


51 


60 


65 


1 


Wheat 


54 


65 


70 . 


Cholam 


48 


60 


70 


Cumbu 


45 


65 


70 


Korra 


42 


50 


55 


TOTAL 


240 


300 


330 


Index Number of Prices in 1960 


11 


240 
240 


x 100 = 


100.0 


Index Number of Prices in 1965 


300 
240 


S 


x 100 = 125 . 


Index Number of Prices in 1970 


330 
240 


X 100 


137.5 


The total ( aggregate ) prices per unit of all the commodities 
in the current year is divided by the total ( aggregate ) prices per 
unit of all the commodities in the base year and the ratio is multi 
plied by 100 . 

ΣΡ 
In 

Σpo 


X 100 


In our example , we have taken different commodities. But 
the unit of measurement of all the commodities is same in all cases . 
Sometimes we may have commodities in differentunits. But 
the method of computation of index number is the same as before . 
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Commodities Unit of measurement 


Price per unit in 


1960 


1965 


Rs . 


Rs . 


Rice 


Kilogram 


2.00 


2.50 


Oil 


Litre 


4.00 


5.00 


: 


Cloth 


Metre 


5.00 


7.25 


Coconut 


Number 


1.00 


1.25 


12.00 


16.00 


Index Number 


16 
12 


x 100 


133.33 


The formula can be written as follows: 


Pin + Pan + Pgn + Pan 
Pio + P + P + P 


* 100 


The numbers 1,2,3 , 4 indicate the serial numbers of the com 
modities. n indicates the current year ; 0 - indicates the base 
year ; P - indicates prices. 

Σ P , 


Σ Ρ . 


x 100 


Disadvantage 


The great disadvantage in this method is that all the items are 
given equal importance by taking only one unit under each item . 
But in actual life we give different importance to different 
items. Further, the prices cf different commodities are expressed 
in different units . If we convert the prices of all the commodities 
for cne uniform unit, say price per kilogram , the index number 
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worked out on the above basis will be completely different from 
the one calculated earlier. Therefore, the index number worked 
out on the basis of the si nple aggregate will not reflect the true 
and practical situation . 


Simple Average of Relatives 


We have already seen that the price index represents the rela 
tive changes in the prices. Hence we shall adopt the price relative 
of each commodity instead of the actual price of the different 
commodities . 


We shall consider the same previous example : 


. 


Commodity 


Unit 


Price per unit in 

- 
1960 1965 
Rs . Rs . 


Price relative 


1. Rice 


Kilogram 


2.00 


2.50 


2.50 
2.00 


= 1.25 


5.00 


2 , Oil 


Litre 


4.00 


5.00 


1.25 


4,00 


i 


3. Cloth 


Metre 


5.00 


7.25 


7.25 
5.00 


= 1.45 


4. Coconut Number 


1.00 


1.25 . 


1.25 
1.00 


= 1.25 


12.00 


16.00 


5.20 


The total cf price relatives = 


5.20 


Number of commodities 


4 


5.20 


The average price relativ s 


1.30 


4 


Index No : 5 


: 


1.30 x 100 = 130 . 
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Formula 


Pai + Png + Ping + Pana 

P PPO 

4 


Pa ) 


X 100 


Ро. 


1 . 
1 


02 


08 


Σ 


Po 
P. 


Σ 


Po 
P. 


11 


X 100 


x 100 


N. 


N 


N 


11 


indicates of commodities. 


n 


indicates the current year. 


indicates the base year . 


In the above method, we have calculated the index number 
by averaging the price relatives by arithmetic method. In other 
words , we have taken the Arithmetic Mean of the price relatives. 
Instead of the Arithmetic Mean , we can also adopt the Geometric 
Mean . In that case , the formula would undergo a change as 
follows: 


1 / N 


I = 7 


( ) 


x 100 


• The price relatives calculated in the above previous example 
are given below : 


Commodity 


Price Relative 


Rice 


1.25 


Oil 


1.25 


Cloth 


1.45 


Coconut 


1.25 


TIL 
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Geometric Mean of the price relatives : 


1 / N 


I 


G ) 


PA 
P. 


( 1.25 x 1.25 x 1.45 X 1.25 ) 
Log ( G.M. of the = log ( 1.25 x 1.25 1.45 $ 1.25) 
price relatives ) 

log 1.25 + log 1.25 + log 1.45 + 1.25 


4 


11 


0.0969 + 0.0969 + 0.1614 + 0.0969 


0.4521 


i 


= 0.1130 


4 


Geometric Mean 


antilog ( 0.1130 ) 
1.2970 


... Index No. 


1.297 x 100 
129,7 


When we adopt the Arithmetic Mean, we get 130 as the index 
number. But when we adopt the Geometric Mean we get 129.7 
as the index number. This is due to the fact that the Arithmetic 
Mean is always greater than the Geometric Mean . 


Geometric Mean is preferred to the Arithmetic Mean 

In the problem of Index Numbers, we are interested to know 
the relative changes rather than changes in the absolute 
values. Geometric Mean gives better result since it 
measures the relative change while the Arithmetic Mean measures 
the absolute change. Hence Geometric Mean is preferred to 
Arithmetic Mean in the calculation of Index Numbers. But the 
Arithmetic Mean is widely used for the simplicity of the compu 
tation . 


Disadvantages 

Out of the two disadvantages given in the case of simple 
aggregation method, the disadvantage due to different units for 
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different items is removed . However, the first defect, namely the 
uniform equal importance given to all the items, still continues. 
This defect also can be removed by giving due importance to the 
respective items. The method which gives the due respect to the 
individual item depending upon its importance is known as 
Weighted Method . The Index Number calculated by this method 
is known as Weighted Index Number. 


Weighted Index Numbers 


Index Numbers calculated by giving weights to various items 
according to their importance are called Weighted Index Numbers. 

The weights to the various items can be given so as to bring 
out their economic importance. In certain cases , the quantities 
of the different items can be taken as the weight. In some other 
cases the values of the items can be taken as the weights. However, 
the following three kinds of weights are generally adopted . 


TYPES OF WEIGHTS 


1. Price Weights 

In this case , the items included in the index numbers are 
given importance according to their prices . 


2. Quantity Weights 

The items included in the index numbers are given impor 
tance according to the quantity purchased or sold or consumed . 


3. Value Weights 


In this case the various items are given importance according 
to the expenditure incurred on those items . 


There are two schools of thought in this process. Though 
there is no difference of opinion in the need for the adoption of 
weights, there is difference in approach in the adoption of weights . 
One school of thought prefers the current year prices or quantities 
or values as the case may be as the appropriate weights and another 
school of thought prefers the base year prices or quantities or 
values, as the case may be , as the appropriate weights. We should 
study these two approaches in detail. 
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Laspeyre s Index Method 

In this method, the quantity of the base year is taken as the 
weight for calculating the Price Index Number . 


I 


P.9 . 
P.9. 


X 100 


For calculation of Index Number of Prices, we require ( 1 ) 
particulars of prices both in the current year and in the base year 
for the different commodities and ( 2 ) the quantities of the different 
commodities in the base year ( 9. ). 


Aggregate Method 


This method is known as Aggregate Method since ( P x 9 ) 
gives the values of items and consequently Ep X q gives the total 
values. 


Example : Calculate the Index. Number of Prices 

following data : 


from the 


TABLE NO . I. 


Price per 


Price per 


Quantity 
purchased 

in the 


unit in the 


Commodity 


( Unit ) 


base year 


unit in the 
current 


. 


base year 


year 


1 


2 


3 


4 


5 


Rs . 


Rs . 


Rice 


kg 


20 


2.00 


2.50 


Oil 


Litre 


5 


4.00 


5.00 


Cloth 


Metre 


10 


5.00 


7.25 


Coconut 


Number 


12 


1.00 


1.25 


From the above table we can construct the following table . 


1 
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TABLE NO . II 


Total value in the 


Commodity 


base year 


Total value in 
the current year 


1 


2 


3 


Rs . 


Rs . 


Rice 


20 x 2 = 40.00 


20 x 2.50 = 50,00 


Oil 


5 x 4 


20.00 


5 X 5.00 = 25.00 


Cloth 


10 x 5 = 50.00 


10 x 7.25 


72.50 


Coconut 


12 x1 = 12.00 


12 Ž 1.25 


15.00 


122.00 


162.50 


The prices of the commodities in the base year is multiplied 
by the quantities and the values are obtained as given in col. ( 2 ) 
of the Table Number II . Similarly, the values given in col. ( 3 ) 
of the Table No. II are obtained by the multiplication of prices 
in the current year by the base year quantities. 


We get the following details from the Table Number II . 
( 1 ) EP . 9. = Rs . 162.50 


( 2 ) P.9 . = Rs. 122.00 


. Index Number of Price 


162.50 
122.00 


X 100 = 133.2 


Average of Ratios or Average of Price Relatives 

Laspeyre s formula can be written as weighted average of 
price relatives also . 
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x Po qo 


Pon 


= 


Σ . ΧΡ. 
ΣΣ2 . 


x 100 


Po 


X Pogo . 


P. 


Pn x P. 90 


P. 

P. 90 


X 100 


In this formula the base year s values of the commodities 
are taken as the weight. This method can be adopted if we are 
given the following details. 

1. The base year values of the different commodities, 
2. The base year prices of the commodities . 
3. The current year prices of the commodities. 


t 


Commodities 


Quantity 


Base Year 
Price Rs. 


Current Year 

Price Rs. 


Rice 


20 kg. 


2.00 


2.50 


Oil 


5 litres 


4.00 


5.00 


Cloth 


10 Metres 


5.00 


7.25 


Coconut 


12 Numbers 


1.00 


1.25 


By multiplying the base year price and the base year quantity 
of each commodity , we can find the base year value of each com 
modity . 


Rice 


20. X 2 


S 


Rs . 40 


Oil 


5 x 4 


Rs . 20 


1 


Cloth 


10 x 5 


1] 


Rs . 50 


Coconut 


12 x 1 


Rs. 12 


Rs. 122 


255 


We shall construct the price relatives . 


Commodity Base Year Current Price rela 

Price year 

tives 
price 

col. (3) ; 
col . ( 2 ) 


Price rela 
tive x base 


year value. 


( 
1 
) 


( 
2 
) 


( 3 ) 


( 4 ) 


( 5 ) 


Rs . 


Rs . 


Rice 


2.00 


2.50 


1.25 


.40x1.25 


50.00 


Oil 


4.00 


5.00 


1.25 


20 x 1.25 = 25.00 


Cloth 


5.00 


7.25 


1.45 


50 X 1.45 


72.50 


Coconut 


1.00 


1.25 


1.25 


12x1.25 = 15.00 


Total 


162.50 


Po 


( 1 ) 


X Polo = 162.50 


P. 


( 2 ) 


po do 


122.00 


Index Number 


162.50 
122.00 


x 100 


1 


1.3.2 


We find now that the index numbers calculated with the 
help of the following two formulae are one and the same. 
( 1 ) 

x 100 

133.2 


Σ Pa 90 

Polo 


( 2 ) 


PO 


x Polo 


Po 


Σ Pogo 


X 100 = 133.2 


This means both the formulae are one and the same , 
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Paasche s Index 

Index Number 


In this method, the quantity of the commodity in the current 
year is used as the weight . We can calculate the Paasche s index 
number when we are given the following three details : 


1. The base year price of each commodity. 


2. The current year price of each commodity . 
3. The current year quantity of each commodity . 


£ P. 90 x 100 

P.9. 


Aggregate Method 


In the above formula what we adopt is the total values of 
the commodities both in the current year and in the base year. 
But the quantity adopted is the current year quantity . Let us 
examine the same previous example with the quantity of the current 
year . 


Commodity 


. 


Unit 


Quantity 


Price per unit Rs . 


Base year 


Current year 


Rice 


kg. 


24 


2.00 


2.50 


Oil 


Litre 


8 


4.00 


5.00 


Cloth 


Metre 


12 


5.00 


7.25 


Coconut 


Number 


16 


1.00 


1.25 


From the above table we should construct another table 
giving the values of the commodities by multiplying the price by 
the quantity of the current year . 
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Commodity 


Value in the base year Value in the current 

Rs . 


year Rs. 


Rice 


24 x 2 


= 


48 


24 x 2.50 == 60 


Oil 


8 x 4 


32 


8 x 5 


= 40 


Cloth 


12 x 5 = 


60 


12 x 7.25 


= 


= 87 


Coconut 


16 x 1 


16 


16 x 1.25 = 20 


156 


207 


- 


P. 90 


= Rs . 207 


( 1 ) Total value of the commodities 

in the current year 
( 2 ) Total value of the commodities 

in the base year 


Σ Poqn 


Rs , 156 


207 


... Index Number 


x 100 = 132.7 


156 


Average of Ratios ( or ) Average Price Relatives 

Paasche s formula can be written as a weighted average of 
price relatives. 


X Poan 


Index Number 


PA 
P. 
P. 
P. 


x 100 


> 


XP, qa 


In this formula , the value of the current year quantity ( 9 ) 
at the base year price ( p. ) is Poqn is taken as the weight . This 
method can be adopted if we are given the following details . 

( 1 ) The base year price of the commodity. 


( 2 ) The current year price of the commodity . 
S. II - 17 
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( 3) The value of the current year quantity of commodity 

at the base year price. 


Base year 


Current year 


Commodity 

( 1 ) 


Quantity in the 
current year 

( 2) 


price Rs . 

( 3) 


price Rs . 

( 4 ) 


Rice 


24 kg 


2.00 


2.50 


Oil 


8 litres 


4.00 


5.00 


Cloth 


12 metres 


5.00 


7.25 


Coconut 


16 nos . 


1.00 


1.25 


By multiplying columns 2 and 3 we get the value. 


Commodity 


Value 


Rs . 


Rice 


48 


Oil 


32 


Cloth 


60 


Coconut 


16 


156 


Commodity 


Price relative 


Price relatives x 

weight 


Rice 


1.25 


1.25 X 48 


60 


Oil 


1.25 


1.25 x 32 


40 


Cloth 


1.45 


1.45 x 60 = 


87 


Coconut 


1.25 


1.25 x 16 = 


20 


207 
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Index Number 


207 

x 100 = 132.7 
156 


We find that the index numbers calculated with the help of 
the following two formulae are one and the same. 


( 1 ) 


E Pnen 

P.90 


X 100 


w 


X Polo 


( 
2 
) 


PA 
P. 
P. 
P. 


x 100 


Σ 


X Po In 


1 


Comparison of Laspeyre s Method and Paasche s Method 


1. In both the formulae we use the prices prevailed in the 
base year and the current year. 

Pa 
2. The price relatives used are also same 

P. 


3. In the case of Laspeyre s Method we have used base 
year quantity as the weight when we adopt the actual prices . 


4. In the case of Paasche s Method , we adopt the curren 
year quantity as the weight when we adopt the actual prices. 


5 . When we adopt the price relatives, instead of the actual 
quantity we adopt the actual expenditure incurred in the base year 
as the weight in the case of Laspeyre s method. 


6 . When we adopt the price relatives instead of the actual 
quantity we adopt the anticipated expenditure in the current year 
and not the actual expenditure as the weight in the case of Paasche s 
Method . (Value of the commodity of the current year quantity 
at the base year price level) which is something imaginary and 
not actual, 


Because of the basic differences of item 5 and 6 , Laspeyre s 
method is preferable to the other. 
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Marshall - Egdeworth Index Method 


A compromise of the Laspeyre s formula and Paasche s for 
mula is also used to calculate the price index number. In this 
new method, sum of the quantities of both base year ( go) and 
the current year ( qn) ie : ( g . + q . ) is used as the weight. The 
formula is given below : 


Price Index Number 


P. (9. + 9 ) 
P. ( 9. + 9 ) 


x 100 


( or ) 


Pag. + PRA 

x 100 
{ P. 4. + EPon 


In this method the weights ( p , and q . ) are changed every year 
and hence cannot be used for comparison . But in the case of 
Laspeyre s method the weights Pa 4. are fixed . 


Hence the index numbers can be directly computed. 


Quantity 


Prices in the 


Commodity 


Base Year Current 

Year 


Base Year Current Year 
Rs. 

Rs . 


90 


An 


P. 


Rs. 


Rs. 


Rice (kg) 


20 


24 


2.00 


2.50 


Oil (litre) 


5 


8 


4.00 


5.00 


Cloth (metre ) 


10 


12 


5.00 


7.25 


Coconut (No.) 12 


16 


1.00 


1.25 
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When we adopt the sum of the quantities of both the base 
year and the current year , the weight will be as follows: 


Rice 


20 + 24 


|| 


44 kg . 


Oil 


5 + 8 


13 litres 


Cloth 


10 + 12 


22 Metres. 


Coconut 12 + 16 


11 


28 Numbers . 


After computing the weight, we can adopt the aggregate 
method. 


Aggregate Method 


Quantity Value in the 
Commodity Weight Base Year. 

Rs . 


Value in the 
Current Year. 

Rs . 


1 


Rice 


44 


44 x 2 = 88 


44 x 2.50 


110.00 


Oil 


13 


13 x 4 = 


52 


13 x 5 


65.00 


Cloth 


22 


22 x 5 


110 


22 x 7.25 


159.50 


Coconut 


28 


28 X 1 = 


28 


28 x 1.25 = 35.00 


278 


369.50 


I Pn ( g . + 9n ) = 369.50 

P. ( q . to 9n ) = 278 


Index Number 


369.50 
278 


x 100 


132.9 
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Quantity Index 


We have so far considered the construction of Index Number 
of Prices. In all these cases, we have used the following items. 


Po 


1. Price relative 


P. 


2. Any one of the following items as weight: 


i . Base year quantity ( 9. ) 


ii. Current year quantity ( 9 ) 


iii. Sum of the quantities of both base year and current 

year ( 9. + 9 ) 


In the same manner we can also calculate the Quantity Index . 


an 


In this we, use quantity relatives 


instead of price relatives. 


9 . 


We use price as the weight. 


The following items are used : 


90 


1. Quantity relative 


9 . 


2. Any one of the following prices as weight. 


i . Base year price Po . 


ii. Current year price Pn : 

iii. Sum of both base year and current year prices. ( potpn) 
The corresponding formula will be as follows: 


1 . 


Σ α . Ρο 
Eq. Po 


x 100 


2 . 


Σq Pa 
Σq . Ρ . 


x 100 


3 
. 


an ( Pn + po ) 
9. (Pn + po) 


X 100 
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Value Index 


As we have calculated the Index Numbers for prices and 
quantities we can also calculate index number for values. When 
we multiply the price by the quantity we get the value. 

Pnen : Value of the commodity in the current year. 
P.9 . : Value of the commodity in the base year . 


Index Number V 


Σ Pnqn 
Σ po go 


x 100 


Commo Quantity 

Prices Rs . 
dity 

Base Year Current Year Base Year Current Year 


1 


2 


3 


4 


5 


Rice 


20 


24 


2.00 


2.50 


Oil 


5 


8 


4.00 


5.00 


Cloth 


10 


12 


5.00 


7.25 


Coconut 


12 


16 


1.00 


1.25 


From the above table we can compute the values of the 
commodities in the base year and current year . 


Commodity 


Value in the Base Year 

Rs . 


Value in the Current 

Year Rs . 


Rice 


40 


60 


Oil 


20 


40 


Cloth 


50 


87 


Coconut 


12 


20 


122 


207 


207 


Index Number 


x 100 
122 


169.7 
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Fisher s Ideal Index Number 

A compromise of Laspeyre s formula and Paasche s for 
mula is also adopted. This formula is known as Fisher s Ideal 
Index Number. It is the Geometric Mean of the above two for 
mulae. 


Index Number 


V 


Σp , 
Σpo go. 


X 


Pnqin 
pola 


x 100 


Ideal Index Number 


Because of the following reasons it is called an ideal index 
number . 


1. We know that the Geometric Mean is the best tool to 
indicate the relative changes . As index number is used to indicate 
the relative changes and Fisher s index is based on Geometric 
Mean , it is better than other index numbers. 

2. The prices and quantities of both the base year and cur 
rent year are considered in the construction of index number . 

3. It satisfies the three tests namely , (i ) - The Commodity 
Reversal Test; ( ii) The Time Reversal Test and ( ii ) The Factor 
Reversal Test . 


However, this index number is not popular in use because 
of the laborious calculations involved in the construction . 


Let us calculate the Fisher s Ideal Index Number for the 
following data : 


Quantities 


Prices Rs . 


Commodity 


Base year Current year Base year Current year 


lo 


an 


Po 


Pa 


Rice 


20 


24 


2.00 


2.50 


Oil 


5 


8 


4.00 


5.00 


Cloth 


10 : 


12 


5.00 


7.25 


Coconut 


12 


16 


1.00 


1.25 
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A 


1. We should first calculate the Index Number as per 

Laspeyre s Method . 


I 


X 100 


Pn 90 
P.9 
( 50 + 25 + 72.50 + 15 ) 
( 40 + 20 + 50 + 12 ) 
133.2 . 


.11 


162.50 
122.00 


x 100 


2 
. 


We should calculate the Index Number as per Paasche s 
Method : 

P190 
I 

p . 9 . 
( 60 + 40 + 87 + 20 ) 
( 48 + 32 + 60 + 16 ) 


X 100 


11 


X 100 


207 


x 100 


156 


132.7 


3. We should calculate the product of these two index 
numbers : 

133.2 x 132.7 


4. We should first find the square root of their product : 


I 


V133.2 x 132.7 


log 1 


By taking log . on both sides we get, 

log 133.2 + log 132.7 

2 

2.1245 + 2.1229 
log / 

2 
4.2474 


. 


2 


2.1237 . 


Index Number 


Antilog of 2.1237 
132.9 


S. II - 18 
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Fixed Base and Chain Base Index Numbers : 
Fixed Base 

When we have values for a series of years we can take any 
one of the years as the base and calculate the index numbers 
for the values of the remaining years . This is known as Fixed 
Base Method. Index Numbers of this type can be compared 
more easily and effectively because of the common base . An 
example of this is given below : 
Year 

Price 

Index Number 


1960 
1961 
1962 
1963 
1964 


20 
25 
40 
.50 
70 


100 
125 
200 
250 
350 


We can take 1960 as the fixed base and work out the index 
number for the other years . 


Chain Base Method 


In this method, a common period is not taken as the base 
year . Instead , the year previous to the current year will be taken 
as the base for the succeeding year and because of this link this 
base is called chain base . 


Let us calculate the index number with the chain base method : 


Year 


Value 


Chain Base Index 


20 


1960 


20 


X 100 


100 


20 


25 


1961 


25 


x 100 = 125 


20 


40 


1962 


40 


x 100 = 160 


1963 


50 


50 
40 


X 100 


= 125 


70 


1964 


70 


x 100 


140 
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Conversion from Chain base to the Fixed Base 


Chain base Index number can be converted into the Fixed 
Base Index Number . 


Year Chain Base Index 


Fixed Base Index 


1960 


100 


100 

x 100 = 100 
100 


1961 


125 


125 

x 100 = 125 
100 


1962 


160 


125 

Х 
100 


160 

x 100 = 200 
100 


1963 


125 


160 


125 
100 


Х 


X 


125 

x 100 = 250 
100 


100 


1964 


140 


125 

Х 
100 


160 

Х 
100 


125 
100 


x 


140 

x 100 = 350 
100 


Merits of chain base method index number 


1. People are generally interested in comparing the current 
year with the immediate proceeding year rather than with remote 
past . In such cases the chain base method is more useful. 

2. It accommodates changes taken place in quick succession . 
It helps to add new items and also delete old items . 

3. It helps to have a quick direct comparison of successive 
years .. 

. 
4. It also helps to change the base as and when desired . 


Tests of consistency for index number 

When we studied the Fisher s Ideal Index Number, we 
have a reference to the following three tests 

i . Commodity Reversal Test. 
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ii. Time Reversal Test. 


iii. Factor Reversal Test. 


It is said that index number which satisfies these three tests 
is an ideal index number. Let us see in detail about each of 
the tests . 


1. Commodity Reversal Test 


It has been said earlier that in the construction of index 
numbers we should consider not one item but many items. These 
items may be arranged in some order. If the order of arrange 
ment of these items is changed , and the index number is worked 
out for the revised order , there will not be any difference between 
the original index number and the revised index number. This 
is due to the fact that there is nochange in the item either by addition 
or deletion . But the only change effected is in the order in 
which the items are considered . The change in order will not 
have any effect in value of the index number. Therefore , some 
are of the opinion that this will not constitute a test at all. 


2. Time Reversal Test 


We have seen that there are two periods , namely current 
year and base year , in the construction of index number. Generally 
the index number for the current year will be calculated with 
reference to the base year. Similarly we can also calculate the 
index number for the base year with reference to the current year . 
What we normally expect is that the one will be the reciprocal of 
the other . In other words , the product of these sets of index 
numbers will be 1 . 


The above condition will be satisfied if only one item is alone 
considered for the construction of index number . If more than 
one item is considered and also if the arithmetic average of the 
relatives are considered , this time reversal test will not be satisfied 
in the case of the two above formulae . This may be obvious and 
this can be verified with the help of the formula itself. 
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( a ) Base year quantity as weight (Laspeyre s Method ) 


I == 


Σ P 
Σ Pο 4ο 


Let us interchange n and o in the formula . 


The formula will be revised as 


I 


Σ P , go 
Σ Ρ . 4η 
ΣΡ , 4ο 
ΣΡ , 4ο 


I XI 


X 


Σpo gn 
ΣΡ , 4η 


which is not equal to 1 . 


( b ) Current year quantity as weight ( Pasches Method) 

ΣΡ , 9η 
Ι 

Σpo 4η 


Interchange the letters n and o , 


I 


ΣΡ , 4ο 
Σ Ρ , 4ο 
ΣΡ 4η 
Σpo 4η 


I x 1 


== 


X 


ΣΡ . 4ο 
ΣΡ , 4ο 


: 


which is not equal to 1 . 


( c ) 


Fisher s Index Number 


ΣΡ , 4ο 
ΣΡ , 


X 


Σp 4η 
ΣΡ 9η 


( 1 ) 


Let us interchange the two pericds n and o and verify the 
formula with reference to Time Reversal test . 


The formula will be 


V 


Σpo qn 
ΣΡ 4η 


Σpo 4o 
X 

Σp , 


... ( 2 ) 


270 


Multiplying ( 1 ) and ( 2 ) 


Pago Pnqn 

Σpo gr 
x 

Х 
Pogo Ipoan 

Paan 


Ep.9 . 
Х Σp 4ο 


Cancelling the common terms both in the numerator and 
denominator we get 1. The square root of 1 is also 1 . 


1 


3. Factor Reversal Test 


In the construction of Price Index numbers we are using 
two factors namely Price ( p ) and quantity ( q ). In the case of price 
Index Numbers, quantity is used as weight. 


If we interchange the two factors, i.e : if we replace P by a 
and q by P , we will get another set of index . numbers which 
will be the index number of quantity with price as the weight. 


If we multiply these two index numbers, i.e. the index number 
of Price and the index number of quantity the product should repre 
sent index number of the total expenditure, since the product of 
price and quantity will give the total value of the commodity. Per 
haps this condition will be satisfied when we consider only one item . 
But if more items are considered as in the case of construction 
of index numbers and that too where Arithmatic average is adopted , 
this condition will not be satisfied . This can be verified . We 
can examine the two different formulae used for the construction 
of index number. 


( a ) With base year guantity as the weight (Laspeyre s Method ) 


IP 


Σ Ρ . 4ο 

polo 


Let us replace P by q and q by P and the formula will become 

Σ 4 , Ρ . which will be the index number 
IQ 

of quantity 


Σ 4o po 


with base year prices as weight. The product ( P ) of the two 
index numbers: 
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р 


IP x 19 


Pa 90 
Σ po 4o 


Σ 9. PO 
x Σ 

Polo 


The product is not equal to 1 . 


( b ) With current year quantity as weight ( Paasche s Method) 


IP 


Σ po ga 
Σ po gn 


Interchange the fractiors p and q . The formula will become 


ig 


Σ 4. Ρη 
Σ 4ο Ρη 


This will be the index number of quantity with the Price of the 
current year as weight . 


The product IP x 1g 


9. Pa 


Σ Ρη 4 

Σ 4η Ρη 

X 
E po an 

Σ 
and this product will not be equal to the index number of the 
expenditure which will be denoted by the formula 

Σ Ρ , 4 

P 
EP.l. 


The product is not equal to 1 . 


: 


We have so far seen that the present two formulae adopted 
for the construction of index numbers are not satisfying the time 
reversal and factor reversal tests. But it can be proved with the 
help of the formula that the Fisher s index numbers will satisfy 
these two conditic ns and perhaps this may be the reason that 
this index number is called an ideal index number. 


Let us test this formula with reference to factcr reversal test , 


Σ Ρ , 4ο 


Formula 


X 
Polo 


Σ Ρ , 4 , ... 
ТР° qo 
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Inter change the factors p and q . The formula will become 


Σ Po 
9.P. 


Х 


Σq. Ρ . 
Σ 4ο Ρη 


product will become 


g 


✓ 


Σ Ρη 4ο 

P.l. 


X 


Σ Paan 
Σ po 4η 


X 


E en Po 
Σ 4ο Ρο 


X 


Σ ga Pa 
29. PR 


V 


Σ 

Pnen 
Σ p . 4ο 


Х 


Σ 4η Ρη 

9. Po 
Σ Poan 
E pogo 


which is nothing but the index 

number for the expenditure. 


At present, index numbers are being published from various 
Departments. Mention may be made about the cost of living 
index numbers, consumer price index numbers , wholesale price 
index numbers, Index numbers for Agricultural Production, 
Index numbers for Industrial production etc. 


Construction of Cost of Living Index Number 


We know that the prices of various commodities are not 
constant and they are going on changing from time to time . Hence 
we may be interested in knowing how far the changes of prices 
of commodities affect the living of the people. This can be done 
with the help of cost of living Index Numbers. 


Cost of living index numbers are designed to neasure the 
average change in the cost of maintaining a given standard of 
living from year to year . The cost of living index number is 
computed by comparing the prices paid by the consumers of a 
particular class of people living in a particular region in two 
different periods of time for a fixed set of goods and services ( for 
a given standard of living ) representing their level of living. 


Different classes or groups of people consume different types 
of commodities. Even the same type of commodities are not 
consumed in the same proportion by different classes of people . 
Hence separate index numbers are calculated to measure the 
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effects of changes in the prices of various commodities on the 
cost of living of different classes of people . Similarly, separate 
index numbers are prepared for different types of living for the 
same category of people because of the change in prices or pattern 
of consumption from place to place . 


It should be clearly understood that the cost of living index 
number does not measure the actual cost of living. It only tells 
us whether a particular class of people in a particular region have 
to pay more or less for a particular standard of living in a parti 
cular time when compared to a particular base period. In other 
words, it gives the relative changes taken place in the cost 
of living when compared to the base period . As it is a relative 
measure compared to a base period of the same place and not 
a common fixed place, it is not advisable to compare 
the cost of living index number of two different locations and 
arrive at a conclusion about the actual cost of living. We can 
compare the relative changes but not the actuals, since the actuals 
in the base period in both the places may not be the same. 


Construction of cost of living index number 

There are five main procedures in the construction of cost of 
living index number. 

1. Deciding the class of people for whom the index number 

has to be constructed . 


2. Choice of base . 


3. Selection of comniodities. 


4. Determination of weights. 


5. Collecticn of retail price quctations. 


1. Deciding the class of people for whom the index number is 

required 

First we have tc decide for which class of people , for example 
industrial workers, or agricultural workers or Governnient servants 
etc , the index numbers have to be considered . It is very essential 
to decide this in clear terms. Besides th : people , the area should 
also be clearly decided . 
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2. Choice of Base 


As the index number is a relative measure , we should decide 
about the base year with which the present period has to be com 
pared . Generally, the base period should be normal period ,, 
without any serious effects on the prices due to any abnormal condi 
tions such as scarcity or abundance . 


3. Selection of commodities that are to be entered into the Cost 

of living index numbers 


After selecting th : base period , we have to consider the 
various commodities that are to be taken into consideration in 
our construction of index numbers . Generally, this is being 
deter nined by conducting a Family Budget Survey of the concer 
ned class of people . Since a complete survey is not feasible , 
a sample survey is always adopted for this purpose. The samples 
are also selected on random basis to avoid bias. The purpose 
of this survey or enquiry is to find out how much an average family 
of the particular class spends on differenı itens of consumpticn . 
Generally, the expenditure on various items are broadly classified 
into the fcllowing major groups. 


i . Fcod articles ; ii . Cloths; iii. Fuel and Lighting ; 


iv . 


House rent; 


v . Miscellaneous. 


These major groups are divided into minor groups, 
and the minor groups will be further divided into smaller groups 
in such a way that each small group consists a list of commodi 
ties coming under that group . 


The family budget enquiry will give the following information . 
i . The nature, quality and quantity of each of the commodity 

consumed by the people. 


ii . Their retail prices. 


iii. The proportion of the expenditure on a particular item 

to the total expenditure under all the items in the groups: 
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iv . The proportion of the expenditure of a particular group 

to the total expenditure of all the groups. 


With the help of the above details the commodities to be 
included in the construction of the cost of living index are listed . 
The commodities selected should be those generally purchased 
by the class of people for whom the Cost of Living Index is cons 
tructed . After compiling the family budgets, an average budget 
is drawn up and this will be considered to be the standard for 
that particular class of people . 


Collection of Retail price Quotations 


In the construction of cost of living index numbers, we con 
sider only the retail prices and not the wholesale prices of the 
commodities, since the consumers used to purchase commodities 
in small quantities in retail. But the collection of retail prices 
is tedious and difficult. The prices are subject to greater variation 
from shop to shop in the same locality . 


i 


Therefore, special care has to be taken in the collection of 
prices. We should try to collect the price from the shop from 
which more people used to buy. Further, the prices should 
be more representative i.e. the prices for that sort of commodities 
that are mostly purchased by the people, if there are more sorts 
of the same commodity . Prices can be collected through specially 
trained agents after actually observing the transactions. The 
retail price quotations collected are averaged afterwards to give 
an average price for each of the items included in the cc ns 
truction of index numbers . 


Determination of weights 


The relative importance of various items for different classes 
of people is not the same. Hence the cost of living index number 
is always weighted with reference to the importance of the com 
modities. The relative importance of the commodities is decided 
on the basis of the expenditure incurred on the commodities or 
the quantity of consumption as reflected in the average family 
budget: 
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Construction of Index Number 


Generally, Laspeyre s formula is used in the construction 
of cost of living index number : 


Cost of living Index = 


Pomo 
P.9 . 


X 100 


In this, the weights of the commodities in the base year is 
taken as the weights. 


The price of each commodity in the current year is multiplied 
by the base year quantity of that commodity and then the value 
of the commodity is decided . In this manner , the value of all 
the commodities is worked out and the grand total of the values 
of all the commodities at the current year price level is then worked 
out Po 9 .. In the same way the total expenditure in the base 
year for all the commodities at the base year price level is also 
worked out Ep.9 . This is indirectly available from the family 
budget survey also . Afterwards, the current year value is divided 
by the base year value and the ratio obtained will be multiplied 
by 100 which will be the cost of living index number for the current 
year . This method is known as Aggregate Expenditure Method . 


Instead of aggregate expenditure method we can also adopt 
the price relatives and the formula used will be as follows : 


Pin 


Σ 


X Polo 


Cost of Index 


Po 


x 100 


Po 


Pano 
Σ po go 


x 100 


X Polo 


P. 


In this process the index number obtained will be a weighted 
average of price relatives taking the base year expenditure of each 
commodity as the weight . 


1 


In this method , the expenditure under each item is not calcu 
lated as in the first method. Instead , the current year price of 
each commodity will be divided by the base year price of that 
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commodity and the current year price will be expressed as a ratio 
or relative . This price relative will then be multiplied by its 
weight which is equal to the value of that commodity in the base 
year which is always a fixed one . The sum of the weighted price 
relatives of all the items will be divided by the total expemditure 
in the base year Ep.9 . which is also fixed. This ratio will then be 
multiplied by 100 to express the cost of living Index as a percentage 
of the base year index number. In both the cases the result obtained 
will be the same. 


A hypothetical example of construction of Index is given 
for purpose of illustration . " 


Double Weighting 

Generally , the cost of living index number is computed by a 
system of double weighting. We have already stated that the 
commodities considered in the construction of index numbers 
are broadly divided into five major groups namely i . Food ; ii . Fuel 
and lighting; jii . Clothing; iv . House rent and y . Miscellaneous. 
First the index number for each group is separately computed . 
Afterwards the index number of each group is multiplied by the 
weight of the respective group . The weight of each group is 
the percentage of expenditure under the group to the total expen 
diture of all the groups. The total of these weighted indices of 
all the groups will be the index number. 


COMPUTATIONOFINDEXNUMBERBASEDORTHEWEIGHTED 


AVERAGE 


S.No. ofthe items 


Pricein 
the 
Base period 


Priceinthe current period 


Price relative 


Expenditure duringthe 
base 
period 


Percentage 

Pricerelative oftheexpen-xpercentage 
diture 
to 
total 
of 
expenditure expenditure 

X 
W 
P. 


Pn 


Р. 


P. 


Pa 


P. 


Po 
qo 


W 


Rs. 


Rs. 


Rs 
. 


(1) 


(2) 


(3) 


(4) 


(5) 


(6) 


(7) 


278 


1 


10 


15 


1.50 


300 


20 


30.00 


2 


8 


10 


1.25 


240 


16 


20.00 


3 


6 


9. 


1.50 


120 


8 


12.00 


4 


4 


5 


1.25 


1 


90 


6 
° 


7.50 


5 


7 


14 


2.00 


60 


4 


8.00 


6 


5 


6 


1.20 


90 


6 


7.20 


7 


10 


12 


1.20 


90 


6 


7.20 


8 


6 


9 


1.50 


75 


5 


7.50 


} 


9 


12 


-15 


1.25 


45 


3 


3.75 


10 


7 


21 


3.00 


60 


4 


12.00 


11 


10 


15 


1.50 


120 


8 


12.00 


12 


: 
5 


10 


2.00 


60 


4 


8:00 


13 


5 


6 


1.20 


60 


4 


4.80 
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14 


·3 


6 


2.00 


30 


2 


4.00 


15 


2 


3 


1.50 


60 


4 


6.00 


Total 


23.85 


1500 


100 


149.95 
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Exercise 


( 1 ) Calculate the index numbers . 


) 


Base Year 

Rs . 


Current Year 

Rs . 


Base Year 

Rs. 


Current Year 

Rs . 


30 


12 


(i) 20 

26 


( ii ) 10 

15 


20 


39 
45 


20 


25 


30 : 
22 


35 


25 


30 


25 


36 


30 


35 


( 2 ) Calculate the price index by using Pasache s and Laspeyris 

form . 


Base Year 
( i) Price Quantity 


Current Year 
( ii) Price 

Quantity 


5 : 


5 


6 


4 


6 


6 


9 


5 


9 


2 


15 


: 


2 


12 


1 


16 


2 


8 


3 


15 


5 


( 3 ) Write an essay on the construction of Cc st of living index . 

numbers. 
( 4 ) Define index number . Write an essay on the construc 

tion of Price Index Numbers, 


( 5 ) What are the different formulae used in the constructicn 

of Index Numbers. Discuss the merits and demerits 
of the index numbers . 


( 6 ) What are the different tests that are generally adopted . 

Discuss their suitability in the case of different 
formulae adopted . 


NOW 


TE 


SOCIET 


TAN 


LEARN 


PEARN 


THOROUGHLY 


